Computational filtering of methylated sequence data for predictive modeling

ABSTRACT

Computational techniques are disclosed for using methylation profiles to classify the medication condition of a person. Initial sequence data is obtained containing sequences of an initial set of nucleic acids from a biological sample of a person. The initial sequence data is filtered to generate filtered sequence data that describes sequences of a filtered subset of nucleic acids from the biological sample. A methylation profile is determined for the filtered subset of nucleic acids from the biological sample. The methylation profile can be processed to determine a likelihood that the person has the specified medical condition. The system outputs an indication of the likelihood that the person has the specified medical condition.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims the benefit of priority to U.S. Application Ser. No. 62/832,157, filed on Apr. 10, 2019, U.S. Application Ser. No. 62/882,215, filed on Aug. 2, 2019, U.S. Application Ser. No. 62/928,156, filed on Oct. 30, 2019, U.S. Application Ser. No. 63/007,204, filed on Apr. 8, 2020, U.S. Application Ser. No. 63/007,208 filed on Apr. 8, 2020, and U.S. Application Ser. No. 63/007,218, filed on Apr. 8, 2020. The disclosures of each of these applications are considered part of the disclosure of the present document, and are each incorporated by reference in their entireties.

GOVERNMENT FUNDING STATEMENT

This invention was made with government support under grant number HD068578 awarded by the National Institutes of Health. The government has certain rights in the invention.

BACKGROUND 1. Technical Field

This document relates to systems and methods for classifying a condition of a mammal (e.g., a human) using predictive models (e.g., machine-learning models) that process methylation patterns of DNA obtained from a biological sample. Certain implementations of the techniques described herein employ computational methods to perform filtering operations on an input data set that can improve efficiency of the prediction or classification task while increasing model sensitivity.

2. Background Information

Machine-learning involves the analysis of data samples to identify features and patterns in the data samples that can be employed to perform tasks such as classification and prediction without explicit instructions. Some machine-learning techniques determine a model for performing the desired task, such as a neural network, a regression model, a decision tree, a support vector machine, and a naïve Bayes machine.

Methylation patterns in a mammal's DNA have been correlated with certain medical conditions or phenotypes of the mammal. However, data sets describing DNA sequences and/or methylation patterns are typically quite large and can be computationally expensive to process.

SUMMARY

This document describes systems, methods, devices, and other techniques for training and applying models to classify a medical condition of a person or other mammal based on methylation patterns occurring in DNA sequences of the person. In some aspects, the disclosed techniques employ a filtering operation to enrich a subset of sequences represented in an initial data set based on methylation characteristics and/or copy number characteristics of the nucleic acids. The filtering operation can, in some embodiments, achieve advantages including reducing the size of the input data set (e.g., a methylation profile) provided to the classifier (e.g., a machine-learning model), decreasing the time and computational expense required to process the input data set, and/or improving the sensitivity of the model to patterns that have the highest predictive power in differentiating persons with normal from abnormal medical conditions.

In some aspects, the disclosed techniques can involve identifying a set of reference CpG sites of abnormal individuals. Computing systems can estimate either restricted reference component methylomes or mixture methylomes that are independent linear combinations of certain reference component methylomes. The proportions of these components at the reference CpG sites for the tested biological samples can further be estimated, and the system can then predict the methylation level of the tested biological samples at a target set of CpG sites under the hypothesis that the sample is from a normal individual. The predicted methylation levels can then be compared against the observed methylation levels, and a classification for the individual as either exhibiting a normal or abnormal condition with respect to the specified medical condition can then be determined.

Further implementations of the disclosed subject matter include methods performed by a computing system. The system can obtain initial sequence data that describes sequences of an initial set of nucleic acids from a biological sample of a person, the initial set of nucleic acids including nucleic acids originating from multiple different tissues of the person. The system can filter the initial sequence data to generate filtered sequence data that describes sequences of a filtered subset of nucleic acids from the biological sample. Filtering can include (i) selecting target nucleic acids from the initial set of nucleic acids based on at least one of a methylation characteristic or a copy number characteristic of the target nucleic acids and (ii) enriching the target nucleic acids in the filtered subset. A methylation profile can be determined for the filtered subset of nucleic acids from the biological sample. The system processes the methylation profile for the filtered subset of nucleic acids to determine a likelihood that the person has a specified medical condition, and an indication of the likelihood that the person has the specified medical condition can then be provided as an output of the system.

These and other implementations can further include one or more of the following features.

The system can identify a pre-defined set of genomic regions (i.e., genomic loci). Selecting target nucleic acids from the initial set of nucleic acids can include comparing nucleic acid sequences from the initial set of nucleic acids to sequences from the pre-defined set of genomic regions. Enriching the target nucleic acids in the filtered subset can include discarding nucleic acid sequences from the initial sequence data that are not among the sequences from the pre-defined set of genomic regions, while retaining nucleic acid sequences from the initial sequence data that are among the sequences from the pre-defined set of genomic regions. At least a first subset of the pre-defined set of genomic regions can be defined based on those regions in the first subset exhibiting a minimum level of stability with respect to at least one of the methylation characteristic or the copy number characteristic in a population of individuals.

The biological sample can be plasma, and the initial set of nucleic acids can include cell-free DNA in the plasma.

The method can further include actions of identifying a set of restricted reference component methylomes in the initial set or filtered subset of nucleic acids; identifying a set of reference component methylomes; determining a proportion of the reference component methylomes at a reference set of CpG sites in the initial set or filtered subset of nucleic acids; generating predictions of methylation levels at a target set of CpG sites in the initial set or filtered subset of nucleic acids; comparing the predictions of methylation levels at the target set of CpG sites to observed methylation levels; and determining whether the person likely has or does not have the specified medical condition based on the comparison.

The biological sample can be a stool sample.

The biological sample can be cerebrospinal fluid.

The initial set of nucleic acids can be treated to facilitate detection of methylated sites before sequencing.

The specified medical condition can be ovarian cancer, endometriosis, necrotizing enterocolitis, fetal aneuploidy, preeclampsia, or a brain condition.

The methylation profile for the filtered subset of nucleic acids can indicate, for each of a set of multiple genomic loci, a methylation level of the locus. The genomic loci can be a CpG site, CpG island, differentially methylated region (DMR), promoter region, enhancer region, or CpG island shore.

Determining the likelihood that the person has the specified medical condition can include determining a probability that the person has the specified medical condition.

Determining the likelihood that the person has the specified medical condition can include generating a binary indication that the person either likely has the specified medical condition or likely does not have the specified medical condition.

Processing the methylation profile can include providing data representing the methylation profile as input to a machine-learning model, and obtaining the likelihood, or a value from which the likelihood is derived, as an output of the machine-learning model.

The machine-learning model can include at least one of a classifier, an artificial neural network, a support vector machine, a decision tree, or a regression model.

The machine-learning model can define reference or predicted methylation profiles against which the methylation profile for the filtered subset are compared to determine the likelihood that the person has the specified medical condition.

The determined likelihood that the person has the specified medical condition can be used by a medical provider to assess whether to perform additional diagnostic testing on the person.

The determined likelihood that the person has the specified medical condition can be used by a medical provider to at least one of diagnose the person or treat the person for the specified medical condition.

Outputting the indication of the likelihood that the person has the specified medical condition can include at least one of presenting the indication on an electronic display, audibly playing the indication through a speaker, storing the indication in a memory of a computing system for subsequent retrieval, or transmitting the indication in an electronic message to one or more users.

Enriching the target nucleic acids in the filtered subset can include generating the filtered subset so that a fraction of the target nucleic acids that occur in the filtered subset is greater than a fraction of the target nucleic acids that occur in the initial set of nucleic acids.

The filtered subset can consist exclusively of the target nucleic acids. Alternatively, the filtered subset can include both the target nucleic acids and non-targeted nucleic acids.

Some implementations include yet another method performed by a computing system. The method can include actions of obtaining initial sequence data that describes sequences of an initial set of nucleic acids from a biological sample of a person, the initial set of nucleic acids including nucleic acids originating from a multiple different tissues of the person; filtering, by the computing system, the initial sequence data to identify a first subset of sequences from the initial sequence data that correspond to a first pre-defined set of genomic regions; filtering, by the computing system, the initial sequence data to identify a second subset of sequences from the initial sequence data that correspond to a second pre-defined set of genomic regions; processing, by the computing system, data that includes an observed methylation profile of the first subset of sequences to generate a predicted methylation profile of the second subset of sequences; comparing, by the computing system, an observed methylation profile of the second subset of sequences to the predicted methylation profile of the second subset of sequences to determine whether the person has a specified medical condition, wherein the person is deemed to have the specified medical condition if a difference between the observed methylation profile of the second subset of sequences and the predicted methylation profile of the second subset of sequences meets a minimum difference criterion; and outputting, by the computing system, an indication of whether the person was determined to have the specified medical condition.

These and other implementations can further include one or more of the following features.

The first pre-defined set of genomic regions can be regions that exhibit a minimum level of stability with respect to at least one of a methylation characteristic or a copy number characteristic in a population of individuals. The second pre-defined set of genomic regions can be regions that exhibit a minimum difference with respect to at least one of the methylation characteristic or the copy number characteristic between a first sub-population of individuals who have the specified medical condition and a second sub-population of individuals who do not have the specified medical condition.

The first pre-defined set of genomic regions can be a first reference set of genomic regions, and the second pre-defined set of genomic regions can be a first target set of genomic regions. The actions can further include selecting the first reference set of genomic regions as the first pre-defined set of genomic regions from a database that includes multiple reference sets of genomic regions, wherein different ones of the multiple reference sets of genomic regions correspond to different medical conditions; and selecting the first target set of genomic regions as the second pre-defined set of genomic regions from the database, wherein the database further includes a multiple target sets of genomic regions, wherein different ones of the multiple target sets of genomic regions correspond to different medical conditions.

The specified medical condition is preeclampsia, endometriosis, ovarian cancer, necrotizing enterocolitis, or a brain condition.

Some implementations include yet another method performed by a computing system. The method can include actions of obtaining, by a computing system, initial sequence data that describes sequences of an initial set of nucleic acids from a biological sample of a person, the initial set of nucleic acids including nucleic acids originating from multiple different tissues of the person; filtering, by the computing system, the initial sequence data to identify a target subset of sequences from the initial sequence data that correspond to a pre-defined set of genomic regions; comparing, by the computing system, an observed methylation profile of the target subset of sequences to a pre-defined methylation profile to determine whether the person has a specified medical condition, wherein the person is deemed to have the specified medical condition if a difference between the observed methylation profile of the target subset of sequences and the pre-defined methylation profile meets a minimum difference criterion; and outputting, by the computing system, an indication of whether the person was determined to have the specified medical condition.

Additional aspects of the disclosed subject matter includes a computing system having one or more processors and one or more computer-readable media having instructions stored thereon that, when executed by the one or more processors, cause the one or more processors to perform actions of any of the methods/processes disclosed herein. Further aspects include one or more computer-readable media having instructions stored thereon that, when executed by one or more processors, cause the one or more processors to perform actions of any of the methods/processes disclosed herein.

Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention pertains. Although methods and materials similar or equivalent to those described herein can be used in the practice or testing of the present invention, suitable methods and materials are described below.

Other features and advantages of the invention will be apparent from the following detailed description, and from the claims.

BRIEF DESCRIPTION OF DRAWINGS

FIG. 1 depicts a computing environment adapted for processing of sequence data to classify a patient as likely or not likely exhibiting a specified medical condition.

FIG. 2 is a functional illustration of a process for classifying a person as either exhibiting a specified medical condition or not based on methylation patterns in nucleic acids recovered from biological samples.

FIG. 3 is a depiction of a filtering operation that results in increasing the fraction of sequences for target nucleic acids in a set of sequence data.

FIG. 4 illustrates a set of methylation patterns on multiple fragments of cell-free DNA of an individual, along with an example of a bisulfite treatment process.

FIG. 5 is a flowchart of an example process for training and using a machine-learning model to determine whether a specified medical condition of a person or other mammal is or is not normal.

FIG. 6 is a flowchart of an example process for processing sequence data for nucleic acids from a biological sample to determine a methylation profile of the sample and classify a person with respect to the specified medical condition.

FIG. 7 is a flowchart of an example process for training a machine-learning model to generate, from a methylation profile of a patient, a classification result indicating whether a specified medical condition of the patient is or is not normal.

FIG. 8 depicts a block diagram of computing devices that can be used in some embodiments to implement aspects of the techniques disclosed herein.

FIGS. 9A-9G displays the distribution of CpG DNA methylation in plasma samples obtained from pregnant (blue) and non-pregnant (red) women. (FIG. 9A) All CpG sites; (FIG. 9B) CpG sites in promoters; (FIG. 9C) CpG sites in introns; (FIG. 9D) CpG sites in exons; (FIG. 9E) CpG sites in CpG island shores; (FIG. 9F) CpG sites in CpG islands; and (FIG. 9G) CpG sites in enhancers. x axis: % DNA Methylation; y axis: Density.

FIG. 10 provides a hierarchical clustering of plasma methylation sequencing data obtained from pregnant (PL) and non-pregnant (EN) women.

FIG. 11 provides chromosome-specific plots of spatial DNA methylation distributions between DNA methylation in plasma samples obtained between 10-13 weeks from pregnant women (blue) versus non-pregnant (red) women.

FIGS. 12A-12F shows spatial distribution of methylation levels in defined genomic elements on chromosome 19 in cfDNA from pregnant (blue) and non-pregnant female plasma (red). (FIG. 12A) Promoters; (FIG. 12B) Exons; (FIG. 12C) Introns; (FIG. 12D) CpG Island Shores; (FIG. 12E) Enhancers; and (FIG. 12F) CpG Islands.

FIGS. 13A-13G displays the distribution of DNA methylation in chorionic villus sampling (CVS) samples (red) and maternal leukocyte samples (blue) obtained from pregnant women between 10-13 weeks gestation. (FIG. 13A) All sites; (FIG. 13B) Sites in promoters; (FIG. 13C) Sites in introns; (FIG. 13D) Sites in exons; (FIG. 13E) Sites in CpG island shores; (FIG. 13F) Sites in CpG islands; and (FIG. 13G) Sites in enhancers.

FIG. 14 displays the genomic element-specific distribution of CpG DNA methylation in CVS samples (red) and maternal leukocyte samples (blue) obtained from pregnant women between 10-13 weeks gestation.

FIG. 15A-15B provides a heatmap of differentially methylated CpG sites in maternal plasma collected between 10-13 weeks gestation from pregnant women who later developed severe preeclampsia (SPE)<32 versus normal controls (FIG. 15A) or mild preeclampsia (MPE) versus normal controls (FIG. 15B).

FIG. 16 shows the DNA methylation patterns in plasma samples obtained at 10-13 weeks gestation from women who went on to develop severe preeclampsia (red) or mild preeclampsia (blue) compared to normal controls (black). Methylation fraction displayed on the y-axis and the genomic coordinates for chromosome 1 are displayed on the x-axis.

FIG. 17 shows spatial distribution of methylation levels on chromosome 19 in cfDNA samples from pregnant (blue) and non-pregnant female plasma.

FIGS. 18A-18X shows the influence of tissue-specific DNA methylation changes between chorionic villus (CV; red) and maternal leukocytes (MBC; blue) on DNA methylation in plasma from pregnant (dashed red) and non-pregnant (dashed blue) women at specific loci. FIGS. 18A and 18B=RUNX3; FIGS. 18C and 18D=SOX8; FIGS. 18E and 18F=GLI2; FIGS. 18G and 18H=ADGRA1; FIGS. 18I and 18J=ARHGEF16; FIGS. 18K and 18L=MAF; FIGS. 18M and 18N=TBX15; FIGS. 18O and 18P=TFAP2B; FIGS. 18Q and 18R=VGLL4; FIGS. 18S and 18T=FOXP1; FIGS. 18U and 18V=HHEX; FIGS. 18W and 18X=BCAT1.

FIG. 19 shows CV tissue is broadly hypomethylated relative to maternal leukocytes based upon unbiased genome-wide microarray analysis (Chu et al., PLoS One. 2011; 6:e14723).

FIGS. 20A-20B show the presence of PMDs in CV tissue (FIG. 20A) relative to maternal leukocytes (FIG. 20B) is illustrated on a genome-wide level. 1B. Beta represents DNA methylation rate (0-1.0 scale) (Chu et al., PLoS One. 2011; 6:e14723).

FIGS. 21A-21C show that, for each of FIG. 21A, FIG. 21B, and FIG. 21C, Top panel=moving average of the hypomethylation levels of CpG sites levels in CV (solid line) and MBC (dashed line). Middle panel=moving average of the difference in the hypomethylation levels of CpG sites between CV and MBC. The short dense vertical lines above the X axis (appearing as a solid horizontal line) in both the top and bottom panels represent locations of the MspI sites in each chromosome. Bottom: Histogram of the EST and mRNAs aligned to the chromosome generated using NCBI genome Map Viewer. (FIG. 21A) Chromosome 13; (FIG. 21B) Chromosome 18; (FIG. 21C) Chromosome 21.

FIG. 22 shows representative region of chromosome 1 (red box) showing statistically significant differences in CpG methylation rate between CVS and maternal leukocytes. Sites that are hypomethylated in CV relative to MBC are shown in red below the x axis whereas sites hypomethylated in maternal leukocytes relative to CV are shown in blue above the x axis.

FIG. 23 shows the relationship between genomic context of CpG location and differential methylation in chorionic villus.

FIGS. 24A-24D show pre- (FIGS. 24A and 24C) and post- (FIGS. 24B and 24D) LCM of neonatal gut epithelial cells. Representative (n=2) samples.

FIG. 25 shows hierarchical clustering of DNA methylation patterns determined by WGBS following LCM of neonatal gut epithelial cells.

FIG. 26A-26C show plots of fragment size against read count for plasma (EN1, FIG. 26A) and CSF samples (LP372, FIG. 26B; LP374, FIG. 26C), confirming that cell cfDNA fragments from CSF are significantly shorter in length than those from plasma.

FIG. 27 shows plot of fragment size distribution for cfDNA from CSF (LP372; red and LP374; blue) and plasma (EN1; green). CSF cfDNA exhibits a periodicity of fragment density peaks differing in size by approximately 10 bp.

FIG. 28A shows relative distribution of low methylation (<20%), intermediate methylation (20-80%) and high methylation (HM) groups in cfDNA from plasma and CSF.

FIG. 28B shows distribution of DNA methylation in CSF and plasma samples adjusted for specific categories of genomic element (intergenic, intron, exon or promoter).

FIG. 28C shows distribution of DNA methylation in CSF and plasma samples adjusted for specific categories of genomic element (CGI and CGI shore).

FIG. 29 provides sliding windows analysis identifying genomic regions (250 bp) whose CpG methylation characteristics differ significantly between cfDNA from CSF and plasma.

FIG. 30 provides the 100-most differentially methylated loci identified in which cfDNA from CSF is methylated at a lower rate than that of plasma.

FIG. 31 provides the 100-most differentially methylated loci identified in which cfDNA of CSF is methylated at a higher rate than that of plasma.

FIG. 32 provides differentially methylated regions that overlap with genes whose expressions are high in the brain and low in whole blood.

FIG. 33 shows the results of a method of detecting fetal aneuploidy in accordance with a CM model, using all CpG sites.

FIG. 34 shows the results of a method of detecting fetal aneuploidy in accordance with a CM model, using CpG sites differentially methylated between CVS and MBC.

FIG. 35 shows the results of a method of detecting fetal aneuploidy in accordance with a CnP model, using all CpG sites.

FIG. 36 shows the results of a method of detecting fetal aneuploidy in accordance with a CnP model, using CpG sites differentially methylated between CVS and MBC.

FIG. 37 shows the results of a method of detecting fetal aneuploidy in accordance with a PnP model, using all CpG sites.

FIG. 38 shows the results of a method of detecting fetal aneuploidy in accordance with a PnP model, using CpG sites differentially methylated between CVS and MBC.

FIG. 39. Hierarchical clustering of CpG sites identified by bisulfite sequencing of plasma DNA from women who had or had not previously had a hysterectomy.

FIG. 40. Pre- (Panels A and C) and post-(Panels B and D) LCM of neonatal gut epithelial cells. Representative samples are shown (n=2).

FIG. 41. Unsupervised clustering of whole genome sequencing data from laser captured NEC (green) and control (blue) colonic epithelium.

FIG. 42A. Hierarchical clustering of CpG sites identified by whole-genome bisulfite sequencing (WGBS) that were differentially methylated between laser captured gut epithelium from NEC colon (NEC) and control (Ctrl) samples.

FIG. 42B. Hierarchical clustering of CpG sites identified by WGBS that were differentially methylated between laser captured gut epithelium from NEC ileum (NEC) and control (Ctrl) samples.

FIG. 43A. Classical multi-dimensional scaling of targeted genome-wide bisulfite sequencing data from histological sections of NEC (blue) and control (green) colonic epithelium.

FIG. 43B. Classical multi-dimensional scaling of targeted genome-wide bisulfite sequencing data from histological sections of NEC (blue) and control (green) ileal epithelium.

FIG. 44. Hierarchical clustering of CpG sites identified by targeted whole-genome bisulfite sequencing that were differentially methylated between whole blood samples from NEC-affected neonates (NEC) and control individuals (Ctrl).

FIG. 45. Hierarchical clustering of CpG sites identified by targeted whole-genome bisulfite sequencing that were differentially methylated between stool samples from NEC-affected neonates (NEC) and control individuals (Ctrl).

FIG. 46. Differences and overlap between NEC-specific differentially methylated CpG sites identified in blood and stool. CpGs hypermethylated in NEC versus control (designated “up”) are labeled in red. CpGs hypomethylated in NEC versus control (designated “down”) are labeled in blue.

FIG. 47 depicts density plots showing the distribution of DNA methylation across the genome in distinct genomic elements including (A) all sites, (B) introns, (C) exons, (D) enhancers (E) CpG islands, (F) promoters, (G) CpG island shores. Data obtained from the plasma cfDNA of ovarian cancer patients (red) and normal controls (blue) are shown. In each case, the blue line is the higher line at the second peak. X axis=density. Y axis=% DNA methylation.

DETAILED DESCRIPTION

FIG. 1 depicts a computing environment 100 adapted for processing of sequence data to classify a patient as likely or not likely exhibiting a specified medical condition. The environment 100 is configured to train machine-learning model(s) corresponding to one or more medical conditions, and to apply the trained machine-learning models in classification tasks that facilitate screening, diagnosis, and/or treatment of medical conditions of a patient. In some examples, the computing environment 100 allows for improved efficiency and accuracy in screening for conditions such as pre-eclampsia, ovarian cancer, endometriosis, necrotizing enterocolitis, fetal aneuploidy, and/or certain brain or nervous system disorders, through analysis of sequence data derived from biological samples obtained from a patient via minimally invasive techniques (e.g., plasma or stool samples). The computing environment 100 can further employ filtering engines that filter an initial batch of sequencing data to reduce the universe of nucleic acids analyzed when classifying a given medical condition, thereby improving sensitivity of the classifier, while reducing model size and reducing the number of operations required to generate a classification result.

Environment 100 includes a sample analyzer 104 that processes and sequences biological samples 150 from a person. In some implementations, the system is configured to perform “liquid biopsies” on liquid-based biological samples 150 from a person, e.g., plasma samples, stool samples, saliva samples, cerebrospinal fluid samples, urine samples, or cervical swab samples. The plasma (or other biological samples) may include cell-free DNA from the person (and, in some cases, cell-free DNA from a fetus if the person is a pregnant female). Cell-free DNA can originate from various tissues in a person's body. For example, cell-free DNA in a biological sample 150 can include fragments of DNA that were released from cells as a result of processes such as active secretion, necrosis, apoptosis, or a combination of these. The level of cell-free DNA in a sample 150 originating from certain tissue(s) can be correlated with a given medical condition (e.g., disease). Moreover, the methylation patterns of cell-free DNA originating from certain tissue(s), and which occur in the biological sample 150, can be correlated with a medical condition (e.g., disease) of the person. To analyze the level of cell-free DNA or other nucleic acids associated with a specified medical condition, and to analyze the methylation patterns of these nucleic acids, the sample analyzer 104 may process the biological sample 150 to generate initial sequence data for the fragments of extracellular DNA and/or other nucleic acids in the sample 150.

The initial sequence data describes sequences of nucleotides occurring in nucleic acids from the sample 150, e.g., sequences of bases along fragments of cell-free DNA. Typically, the initial sequence data includes sequence descriptions for both targeted nucleic acids and background (non-targeted) nucleic acids in the biological sample 150. The targeted nucleic acids are those that are deemed significant to detection of a specified medical condition, while the background nucleic acids are deemed insignificant or less significant to detection of the specified medical condition. The mixture of targeted and background nucleic acids may vary for different medical conditions. For example, some nucleic acids may be classified in the target set for detection of ovarian cancer, while those same acids may be classified in the background set for detection of necrotizing enterocolitis. In some implementations, the set of target nucleic acids that are deemed significant to a particular medical condition are those originating from particular tissue(s) affected by a specified medical condition. For instance, the target nucleic acids associated with endometriosis may include or consist of cell-free DNA from the endometrium or uterus. Likewise, the target nucleic acids associated with necrotizing enterocolitis may be based on intestinal tissue. Often, the fraction of targeted nucleic acids in the biological sample is small in relation to the fraction of background nucleic acids, and the fraction of sequences reflected in the initial sequence data for targeted nucleic acids may also be small in relation to sequences of background nucleic acids. As a result, the initial sequence data may contain substantial levels of “noise” relative to the signals present in the sequences for targeted nucleic acids, which can degrade the performance of models in predicting whether or not a patient likely has a specified medical condition.

The sample analyzer 104 is configured to sequence nucleic acids in the biological sample 150 using any suitable technique, including polymerase chain reaction (PCR)-based methods such as droplet digital PCR, or next-generation sequencing (NGS). In some implementations, nucleic acids from the biological sample 150 undergo bisulfite conversion before sequencing in order to facilitate subsequent detection of methylated sites. Bisfulite treatment has the effect of converting unmethylated cytosines (C) to uracil (U), which in turn are converted to thymine (T) in the course of DNA amplification. In contrast, bisulfite treatment does not affect methylated cytosines (C). As a result, bisulfite conversion enables differentiation of methylated from non-methylated cytosines, which appear as different bases in the sequencing data. Methylation arrays, methylation-specific PCR, enrichment, and/or additional methods may also or alternatively be applied.

Sample analyzer 104 can output initial sequencing data for a biological sample 150 for receipt by a user's computing system 102. The user's computing system 102 can comprise one or more computers in one or more locations. System 102 can be, for example, a desktop computer, notebook computer, tablet computer, or smartphone. System 102 includes a network interface for communicating over one or more networks 106 such as a local area network (LAN), a wireless LAN (WLAN), the Internet, or a combination of these. System 102 may include peripherals such as a keyboard, pointing device, and display screen to enable user interaction with the system 102. In some implementations, the user may coordinate activities at the system 102 for obtaining a classification result related to a specified medical condition based on sequencing data from biological sample 150. For example, upon obtaining initial sequencing data from sample analyzer 104, the user may instruct system 102 to send a classification request 152 to classification system 120. Classification system 120 can comprise one or more computers in one or more locations. System 120 may be located remotely from the user's system 102, or may be located at the same premises on system 102. In some implementations, the capabilities of systems 102, 120, 110, or any two of these, are consolidated in a single, integrated system. In general, classification system 120 processes a classification request 152 and returns a classification result 158 that indicates a predicted likelihood that the person who provided biological sample 150 either has or does not have a specified medical condition (e.g., preeclampsia, necrotizing enterocolitis, endometriosis, ovarian cancer, fetal aneuploidy, an abnormal brain condition, or others).

In more detail, classification system 120 can include a package selector 122, filtering engine 124, methylation profiler 126, and model evaluator 128. Each of these components 122, 124, 126, and 128 may be implemented on one or more computers using a combination of software, hardware, or firmware. Package selector 122 is operable to select, from a library of medical-condition packages 140 a-n, a particular package 140 that corresponds to the medical condition specified in classification request 152. Each package 140 a-n defines information or instructions usable by classification system 120 to generate a classification result as to a different medical condition. In some implementations, the package 140 for a given medical condition can include one or more sequence filters 142, a loci list 144, and a machine-learning (ML) model 146, each of which is specific to the corresponding medical condition. Classification system 120 can load a retrieved package 140 to facilitate generation of a classification result 158 responsive to request 152. In some implementations, package selector 122 can select individual components of a package 140 as needed, such as filters 142, loci list 144, or ML model 146, to the exclusion of the others.

Sequence filters 142 provide information that enable a filtering engine 124 of the classification system 120 to filter initial sequence data by retaining sequences for target nucleic acids corresponding to the specified medical condition and discarding sequences for background (non-targeted) nucleic acids for the specified medical condition. In some implementations, filters 142 provides a whitelist that identifies target nucleic acids that should be retained in a filtering operation, such as cell-free DNA fragments that originate uniquely from a particular tissue associated with the specified medical condition. In some implementations, filters 142 provide a blacklist that identifies background nucleic acids that should be discarded in a filtering operation, such as cell-free DNA fragments that are not uniquely originated from a particular tissue associated with the specified medical conditions. In some implementations, filters 142 provide a whitelist that identifies target nucleic acids that should be retained (while other nucleic acids not specified in the whitelist are discarded) in a filtering operation, such as cell-free DNA fragments that are uniquely originated from a particular tissue associated with the specified medical condition. The whitelist, blacklist, or other information in filters 142 can define a set of genomic regions or sequences of nucleic acids against which sequences in the initial set of nucleic acids from the classification request 152 are compared to assess whether to retain or discard for further processing. For example, all nucleic acids in a plasma sample can be sequenced and the filtering can remove certain sequences identified in the blacklist. Alternatively, all nucleic acids in the sample can be sequenced and the filtering can discard everything but the sequences identified in a whitelist.

In some implementations, the filtering engine 124 is configured to perform a filtering operation that involves (i) selecting target nucleic acids from the initial set of nucleic acids based a methylation characteristic (e.g., methylation status), a copy number characteristic of the target nucleic acids, or both, and (ii) enriching the target nucleic acids, e.g., to increase a fraction of the target nucleic acids after filtering relative to their fraction in a pre-filtered set. In some examples, filters 142 identify a set of genomic regions in a whitelist. Selecting target nucleic acids from the initial set of nucleic acids can include comparing sequences of nucleic acids from the initial set of nucleic acids to sequences in the pre-defined set of genomic regions. Enriching the target nucleic acids in the filtered subset can include discarding sequences of nucleic acids from the initial set of nucleic acids that do not appear in the pre-defined set of regions, while retaining nucleic acids that do appear in the pre-defined set of regions. The pre-defined set of genomic regions can identify target nucleic acids having at least a minimum level of stability (e.g., minimum threshold stability) with respect to the methylation characteristic and the copy number characteristic, wherein the identified target nucleic acids originate from multiple different tissues. Additionally or alternatively, the pre-defined set of genomic regions can identify target nucleic acids originating from a subset of the multiple different tissues for which at least one of the methylation characteristic or the copy number characteristic differs by at least a minimum amount (e.g., minimum threshold) between individuals who have the specified medical condition and individuals who do not have the specified medical condition. In other examples, filters 142 identify a pre-defined set of genomic regions or sequences in a blacklist/exclude list, and corresponding operations can apply to select and enrich target nucleic acids except that the target nucleic acids are identified by discarding nucleic acids within the blacklist/exclude list.

Loci list 144 identifies the set of genomic loci whose methylation statuses are processed to generate a classification result with respect to a particular medical condition. The methylation profiler 126 within classification system 120 can use the loci list 144 to construct a methylation profile for a set of nucleic acids. For example, the loci list 144 may identify specific nucleotides, CpG sites, CpG islands, differentially methylated regions, promoter regions, enhancer regions, and/or CPG island shores whose methylation statuses can be processed to inform a classification with respect to the corresponding medical condition. In some implementations, the loci list 144 identifies genomic loci that occur only within the set of target nucleic acids for the medical condition. The loci list 144 can provide that the methylation statuses of all CpG sites within the target nucleic acids should be processed to generate a classification result. Alternatively, the loci list 144 can provide that the methylation statuses of only a subset of CpG sites within the target nucleic acids should be processed to generate a classification result. The subset of CpG sites (or other genomic loci) can be deemed the most statistically significant, or those that have the highest predictive power, for accurately classifying whether a patient has or does not have a specified medical condition. In some implementations, the loci list 144 identifies genomic loci that occur anywhere in the genome, regardless of whether the loci occur in target or background nucleic acid. The genomic loci in this embodiment may be processed without a separate filtering step that discards all or some of the background nucleic acid sequences, and the loci may have been identified as the most statistically significant, or those having the highest predictive power across the genome for accurately classifying whether a patient has or does not have a specified medical condition. The methylation profiler 126 can analyze the initial set of sequence data from the classification request 152 or the filtered set of sequence data from filtering engine 124 to determine the methylation status of all or some of the loci identified in list 144. The methylation status can be expressed in a number of ways, such as a binary value indicating whether the methylation level at a locus is above or below a pre-defined threshold, or a normalized value within a pre-defined range of values indicating a relative methylation level at the locus across multiple DNA fragments encompassing the locus.

As used interchangeably herein, “methylation state,” “methylation profile,” “methylation status,” and “methylation level” refer to the presence, absence, percentage, and/or quantity of methylation at a particular nucleotide, or nucleotides, within a DNA region, e.g., a genomic locus. The methylation status of a particular DNA sequence (e.g., a genomic locus) can indicate the methylation state of every nucleotide in the sequence, indicate the methylation state of any of the nucleotides (e.g., cytosines) in the sequence, can indicate the methylation state of a subset of the nucleotides (e.g., of cytosines), can indicate the percentage or fraction of methylated cytosines at any particular stretch of nucleotides within the sequence or can indicate the average rate of methylation of all the cytosines (or a subset of the cytosines) present in a nucleic acid.

As used herein, a “methylated nucleic acid molecule” refers to a nucleic acid molecule that contains one or more methylated nucleotides that is/are methylated.

As used herein, a “CpG site” or “methylation site” is a nucleotide within a nucleic acid that is susceptible to methylation either by natural occurring events in vivo or by an event instituted to chemically methylate the nucleotide in vitro. A “CpG island,” as used herein, describes a segment of a nucleic acid, e.g., DNA sequence, that have a high frequency of CpG dinucleotide repeats. A “CpG island shore,” as used herein, refers to methylation hotspots that are present a short distance, e.g., less than 2 kb, from CpG islands.

Machine-learning model 146 is a model that correlates methylation patterns from a methylation profile to likelihoods that a person has or does not have a specified medical condition. The machine-learning model 146 can be loaded by model evaluator 128 in classification system 120. Further details on training and applying machine-learning models are described with respect to FIGS. 5-7. For example, model evaluator 128 can provide the methylation profile from methylation profiler 126 as input to the machine-learning model 146, and can evaluate the machine-learning model 146 based on the methylation profile to generate a classification result 158 indicating whether the patient likely has or does not have the specified medical condition. The machine-learning model 146 can be implemented in a number of different forms. Typically, model 146 is trained using supervised learning techniques, and can be a linear regression model, logistic regression model, decision tree, support vector machine, naïve bayes model, random forest model, or artificial neural network (e.g., a feedforward neural network, deep neural network, recurrent neural network, and/or convolutional neural network), or a combination of two or more of these.

Environment 100 can further include a machine-learning system 110 for training the machine-learning models 146 in the library of packages 140 a-n. System 110 can be implemented one or more computers in one or more locations, and may be accessible to the user's system 102 and classification system 120 via a direct connection or by connection over a network 106. In some implementations, the functionality of system 110 can be integrated with the user's system 102, classification system 120, or both. System 110 can include a training data receiver 112, training engine 114, and model provider 116. Receiver 112 is configured to receive training data 132 for training a machine-learning model. Different training data sets 132 can be used to train different models corresponding to different medical conditions. For example, a first training data set may be constructed for training a model that screens for pre-eclampsia, while a second training data set may be constructed for training a model that screens for endometriosis. The training data 132 can include a collection of training samples 130, each training sample 130 comprising (i) a set of filtered or unfiltered sequence data describing sequences of nucleic acids from a biological sample of a person and (ii) a label indicating whether or not the patient exhibits a specified medical condition. The label can serve as a target output for the model 146 when evaluated on a methylation profile derived from the sequence data in the training sample. Further details of a process for training a machine-learning model 146 is described below with respect to FIGS. 5 and 6. Training engine 114 receives the training data 132 from receiver 112 and processes this data 132 to train model 146. Trained model 116 can then be returned by model provider 116 and stored in a library in the package for the corresponding medical condition on which the model 116 was trained.

FIG. 2 depicts a functional illustration of a process 200 for classifying a person as either exhibiting a specified medical condition or not based on methylation patterns in nucleic acids recovered from biological samples such as plasma, cerebrospinal fluid, or stool. Filtering engine 124 receives initial sequence data 124, which contains information describing sequences of nucleic acids (e.g., cell-free DNA) from the biological sample. The initial sequence data 124 may include a relatively low fraction of sequences for target nucleic acids related to the specified medical condition and a relatively high fraction of sequences for background nucleic acids that are substantially unrelated to the medical condition. Filtering engine 124 performs a filtering operation on the initial sequence data 202 to generate filtered sequence data 204. Filtered sequence data 204 virtually enriches target nucleic acids in the specimen by discarding sequence data for all or some of the background nucleic acids. As a result, filtered sequence data 204 may include a higher fraction of sequences for the target nucleic acids than the fraction of the same sequences in initial sequence data 202. In some implementations, it is unnecessary for the filtering operation to completely distill the cohort of sequences to just those sequences for target nucleic acids. While complete distillation may be achieved in some cases, for many applications it is sufficient for the filtering operation to increase the fraction of sequences for the target nucleic acids relative to the fraction of sequences for the background nucleic to a threshold level such that the data is usable to generate a classification result with a minimally specified confidence level. Methylation profiler 126 then processes the filtered sequence data 204 to generate a methylation profile 206 that identifies a methylation status (e.g., methylation level) at each locus of a set of genomic loci within the target nucleic acids, or within a combination of target and background nucleic acids, represented in filtered sequence data 204. Machine-learning model evaluator 128 then uses a trained machine-learning model to process methylation profile 206 and generate a classification result 208 indicating the system's prediction as to whether the patient exhibits the specified medical condition.

Referring to FIG. 3, an illustration is shown of a filtering operation that results in increasing the fraction of sequences for target nucleic acids in a set of sequence data. The top bar in FIG. 3 represents a collection of initial sequence data, e.g., initial sequence data 202. The initial sequence data 202 includes a relatively small fraction 302 of target nucleic acid sequences and a relatively high fraction 304 of background (non-targeted) nucleic acid sequences. Due to the overwhelming presence of sequences for background nucleic acids in the initial data set 202, the significance of the sequences having greater predictive power for the target nucleic acids is diminished. After the filtering operation, however, a filtered sequence data set 204 results in a significantly higher fraction of target sequences 302 than occurs in the initial data set 202, and a lower fraction of non-target sequences 304.

FIG. 4 depicts a representation of a set of methylation patterns 404 a-n on multiple fragments of cell-free DNA of an individual. Methyl groups (CH₃) bonded to the DNA at certain CpG sites form a methylation pattern on each fragment. In some implementations, methylation sites are detected by first treating the DNA with bisulfite to convert methylated cytosines to uracil, and interpreting the uracil sites as methylated sites in the original DNA strand. Other techniques for detecting methylated loci in nucleic acids may also be employed rather than or in addition to bisulfite conversion, although FIG. 4 illustrates a suitable method.

In some implementations, the systems and methods disclosed herein can be applied to generate a classification result indicating whether a patient likely does or does not have a specified medical condition according to the process 500 depicted in FIG. 5. The process 500 is based on the observation that the methylome(s) of certain tissue(s) in a person or other mammal can be affected by certain medical conditions that reflect diseases or abnormalities in the body (e.g., preeclampsia, endometriosis, necrotizing enterocolitis, or other conditions described herein), and that the changes of these methylomes can lead to changes in the methylation patterns of the DNA fragments found in a biological sample (e.g., a plasma, stool, urine, or saliva sample), which are released by these tissue(s). The term “methylome,” as used herein, refers to the amount or pattern of methylation at different sites or regions within a population of cells. The methylome can correspond to all of the genome, a subset of the genome (e.g., repeat elements in the genome), or a portion of the subset (e.g., those areas found to be associated with a specified medical condition). A methylome from plasma can be referred to a “plasma fluid methylome” or a “plasma fluid DNA methylome.” The plasma fluid methylome is an example of a cell-free methylome that includes cell-free DNA (cfDNA). A methylome from stool can be referred to a “stool fluid methylome” or a “stool fluid DNA methylome.” The stool fluid methylome is an example of a cell-free methylome that includes cell-free DNA (cfDNA).

The process 500 was developed to identify changes of methylation patterns in the methylome of a biological sample caused by phenotypes of certain tissues affected by the abnormal medical condition (e.g., intestinal tissue for necrotizing enterocolitis or uterine tissue for endometriosis). One insight behind this process 500 was that the methylome of the DNA fragments in these biological samples is a mixture of a variety of component methylomes, and that the proportion of these different component methylomes in the mixture varies from subject to subject, even among the population with normal tissue phenotype. By constructing a model methylome for a biological sample as a linear combination of various component methylomes, the process 500 can accurately predict the methylation patterns of a new biological sample under the hypothesis that it is from a normal individual. Consequently, the process 500 can exhibit high sensitivity in detecting abnormal methylation patterns in a biological sample caused by changes of the methylomes of some tissues (e.g., intestinal tissues) when the sample is from an affected individual. The process 500 can be performed by any of the computing systems described herein, such as systems 102, 110, and 120 shown in the environment of FIG. 1.

Let i be any CpG site in human genome, z_(i,j) be the methylation level of CpG site i in a biological sample j, p_(i,r,j) be the proportion of the r^(th) component methylome m_(r,j) of particular tissue origin in sample j at site i, m_(i,r,j) be the methylation level of CpG i in methylome m_(r,j). The system models the scenario as follows:

z_(i,j)=Σ_(r=1) ^(R)p_(i,r,j)m_(i,r,j)  (1)

where p_(i,r,j), m_(i,r,j)>=0, m_(i,r,j)<=1, p_(i,1,j)+ . . . +p_(i,R,j)=1.

The model assumes that there is a set of CpG sites S such that, for any CpG site i in S, and any biological sample of a particular type (e.g., plasma, stool, cerebrospinal fluid, saliva, or urine) j from a normal individual, it has m_(i,r,j)=m_(i,r) and p_(i,r,j)=p_(r,j).

That is, the model assumes that in any biological sample from a normal individual, the proportions of different component methylomes in the mixture are the same for all CpG sites in S. The model further assumes that by restricting to the set of CpG sites S, biological samples from all normal individuals have the same set of component methylomes. They are called restricted reference component methylomes (RRCM), and are labeled as m₁ ^(S), . . . , m_(R) ^(S) or simply m₁, . . . , m_(R) when there is no confusion. For any biological sample j from a normal individual, its methylome restricted to set of CpG sites in S can be expressed as a weighted average of the restricted reference component methylomes. More precisely, let z_(j) ^(S) be the methylome of biological sample C restricted to S, then for some mixture vector p_(j)=[p_(j,1) . . . p_(j,R)]^(T), it has:

z_(j) ^(s)=[m₁ ^(S), . . . ,m_(R) ^(S)]p_(j)  (2)

The model also assumes that the set S is the union of two disjoint subsets C and T, where T is a union of K non-empty sets T_(k) such that T=U_(k=1) ^(K) T_(k) where the index k represents the k^(th) type of abnormal tissue (e.g., intestinal tissue) phenotype. T_(k)'s do not need to be disjoint. Moreover, T_(k) itself is the union of two disjoint sets D_(k) and V_(k). Either D_(k) or V_(k) could be empty, but not both. It is assumed that for any biological sample, including one from an abnormal individual, when restricted to CpG sites in C, its methylome can always be expressed as a weighted average of the restricted reference component methylomes. That is, it has: z_(j) ^(C)=[m₁ ^(C), . . . , m_(R) ^(C)]p_(j) regardless whether j is from an abnormal individual. C is called the set of reference CpG sites. On the other hand, for a biologic sample l from an abnormal individual, when restricted to CpG sites in S=CUT, its methylome can no longer be expressed as a weighted average of the restricted reference component methylomes. That is, it has: w₁ ^(S)≠[m₁ ^(S), . . . , m_(R) ^(S)]p_(l) for any mixture vector p_(i). More specifically, for a biologic sample l from the k^(th) type of abnormal individual, it has: 1) w_(j) ^(C)=[m₁ ^(C), . . . , m_(R) ^(C)]p_(l), 2) if D_(K) is non-empty, then w_(l) _(D) _(K)=[m_(1,k) ^(D) ^(k) , . . . , m_(R,k) ^(D) ^(k) ]p_(l) such that [m₁ ^(D) ^(k) , . . . , m_(R) ^(D) ^(k) ]≠[m_(1,k) ^(D) ^(k) , . . . , m_(R,k) ^(D) ^(k) ], and 3) if V_(k) is non- empty, then w_(l) ^(V) ^(k) =[m₁ ^(V) ^(k) , . . . , m_(R) ^(V) ^(k) ]q_(l) such that p_(l)≠q_(l). In other words, in a biologic sample from the k^(th) type of abnormal individual, if the set D_(k) is not empty, the component methylomes of the sample l restricted to D_(k) are no longer the same as the reference component methylome restricted to D_(k). If the set V_(k) is not empty, in this biologic sample, the proportion of the reference component methylomes restricted to V_(k) is no longer the same as the proportion of the reference component methylome restricted to R.

T is called the target set of CpG sites, D_(k) is called the differential methylation target set, V_(k) is called the copy number variation target set, and T_(k) is called the target set for the k^(th) type of abnormal individual.

Certain operations of the process 500 are depicted in the flowchart of FIG. 5, which may be carried out by a system of one or more computers (e.g., systems 102, 110, and/or 120). First, the system identifies the sets of reference CpG sites C, and T₁, . . . , T_(K) for the list of K types of abnormal individuals (502). The system then estimates the restricted reference component methylomes m₁, . . . ,m_(R), or R predictor methylomes n₁, . . . , n_(R) that are independent linear combinations of the reference component methylomes such that n_(r)=[m₁, . . . , m_(R)]q_(r) for R linearly independent mixture vectors q₁, . . . , q_(R) (504). If the reference component methylomes are available, the system estimates the proportions of these components at the reference CpG sites C for the test biologic samples (506). The system predicts the methylation level of the test biologic samples at the target set T_(k) of CpG sites, under the hypothesis that the sample is from a normal individual (508). Then, the predicted methylation levels at D_(k) and V_(k) are compared against the observed methylation levels, and the system rejects the null hypothesis that a test sample is from a normal individual if the observed methylation levels are significantly different from the predicted levels (510). In this manner, a classification result can be generated indicating whether the person or other mammal from whom the biologic sample was obtained either has the specified medical condition (abnormal) or does not have the specified medical condition (normal).

Process 500 of FIG. 5 can be implemented in a number of ways. For example, given the methyl-sequence data for a set of biologic samples from normal individuals, an expectation-maximization (EM) technique or data augmentation technique can be applied to estimate the component methylomes, and then the maximum likelihood method used to estimate the proportion of these component methylomes in the test sample. Below are certain implementations that employ linear regression.

In some implementations of the presently disclosed process 500, it is assumed the restricted methylome of a biologic sample from a normal individual can be approximated by a mixture of two restricted reference methylomes, one representing the DNA fragments from a first specific tissue region (e.g., intestinal tissue region for necrotizing enterocolitis), another representing the DNA fragments from a second specific tissue region. It is further assumed that the estimations of these two reference component methylomes are available. The implementation of the process 500 includes the following steps.

To begin, identify the reference set C, and the target sets T₁, . . . , T_(K) (502). First, collect the methylation data for a set of first cell type samples, a set of second cell type samples, and a set of biologic samples, all from normal individuals. For each type of abnormal individuals, collect a set of first cell type samples, a set of second cell type samples, and a set of biologic samples from that type of abnormal individuals. All these samples should have matched age, race, and other relevant parameters. These are the training data. Next, let x_(i,j) be the observed methylation level of CpG site i in a normal first cell type sample j, and y_(i,l) the observed methylation level of CpG site i in a normal second cell type sample l, s_(x,i) ² the sample variance of x_(i,j) over all normal first cell type samples, s_(y,i) ² the sample variance of y_(i,j) over all normal second cell type samples. Identify the CpG sites S₀ such that for any i∈S₀, it has both s_(x,i) ²<c₀ and s_(y,i) ²<c₀ for some constant c₀. These are CpG sites with stable methylation levels in each type of normal cells. Next, let x_(i,j) be the observed methylation level of CpG site i in a first cell type sample j, including normal and abnormal, and y_(i,l) the observed methylation level of CpG site i in a second cell type sample l, including normal and abnormal, s_(x,i) ² the sample variance of x_(i,j) over all first cell type samples, including normal and abnormal, s_(y,i) ² the sample variance of y_(i,j) over all second cell type samples, including normal and abnormal. Identify the CpG sites S₁ such that for any i∈S₁, it has both s_(x,i) ²<c₀ and s_(y,i) ²<c₀ for some constant c₀, and that the statistical test for the difference between {x_(i,j0): j0 is a normal first cell type sample}, and {x_(i,jk): jk is a first cell type sample of the kth abnormal phenotype}, is not significant for all abnormal phenotypes of first cell type, and that the statistical test for the difference between {y_(i,j0): j0 is a normal second cell type sample} and {y_(i,jk): jk is a second cell type sample of the kth abnormal phenotype} is not significant for all abnormal phenotypes of the second cell type. These are CpG sites with stable methylation levels in each type of cells, and with no difference in methylation level between normal and any abnormal samples. Let x_(i) be the sample mean of x_(i,j) over all first cell type samples, including normal and abnormal, y_(i) the sample mean of y_(i,j) over all second cell type samples, including normal and abnormal. Identify the subset C₀ of S₁ such that for any i∈C₀, it has |x_(i)−y_(i)|>c₁ for some constant c₁. These are CpG sites that are stably methylated in each cell type, with no difference between the normal and abnormal samples of the same cell type, and differentially methylated between different types of cells. Next, let x^(R) ⁰ be the vector of x_(i) for all i∈C₀, and y^(C) ⁰ be the vector of y_(i) for all i∈C₀, where x_(i) is the mean methylation at site i in all first cell type samples y_(i) the mean methylation at site i in all second cell type samples. Note that by the way the set C₀ is selected, there is no difference in the methylation level of any CpG sites in C₀ between normal and abnormal first cell type samples, or between normal and abnormal second cell type samples. Let z_(j) ^(C) ⁰ be the observed methylation levels of CpG sites in C₀ for a biologic sample j of the k^(th) abnormal type. (For convenience, the normal biologic sample is called as sample of the 0^(th) abnormal type). For each sample j belonging to the k^(th) abnormal type, regress z_(j) ^(C) ⁰ against x^(C) ⁰ and y^(C) ⁰ , with the constraints that the intercept must be 0, and the coefficients must be non-negative and add to 1, and get the residual e_(j) ^(C) ⁰ . Identify the subset C₀ ^(k) of C₀ such that for any CpG i in C₀ ^(k), it has

${\frac{e_{i,k}^{2}}{s_{i,k}} < c_{2}},$

and e_(i,k) ²<c₃ for some constants c₂ and c₃, where e_(i,k) ² is the mean of the squared difference between estimated and observed methylation levels of CpG site i in all biologic samples of the k^(th) abnormal type, and s_(i,k) ² the sample variances of methylation levels of CpG site i in the same set of biologic samples. Repeat the above procedure for each type of abnormal biologic samples, the intersection of the subsets C=∩_(k=0) ^(K)C₀ ^(k) is the reference set of CpG sites. These are CpG sites where their methylation levels in both normal and any type of abnormal biologic samples can be accurately predicted by the reference component methylomes from normal individuals.

Next, let T₀=S₀\S₁. Let x^(C) and x^(T) ⁰ be the vectors of x_(i) and x_(h) for all i∈C and h∈T₀ respectively, and y^(C) and y^(T) ⁰ be the vectors of y_(i) and y_(h) for all i∈C and h∈T₀ respectively, where x_(i), x_(h), y_(i), and y_(h) are mean methylation level of sites for a normal first cell type sample or second cell type sample at sites i and h respectively. Let z_(j) ^(C) and z_(j) ^(T) ⁰ and be the observed methylation levels of CpG sites in C and T₀ respectively for a normal biologic sample j, w_(l) _(k) ^(C) and w_(l) _(k) ^(T) ⁰ the observed methylation level of CpG sites in C and T₀ respectively for a biologic sample l_(k) from an individual with the k^(th) type of abnormality, w_(l) _(g) ^(C) and w_(l) _(g) ^(T) ⁰ the observed methylation level of CpG sites in C and T₀ respectively for a biologic sample l_(g) from an individual with the g^(th) type of abnormality, where g≠k. For each j, l_(k), and l_(g), regress z_(j) ^(C), w_(l) _(k) ^(C), and w_(l) _(g) ^(C) respectively against x^(C) and y^(C), with the constraints that the intercept must be 0, and the coefficients must be non-negative and add to 1. Apply the fitted models respectively to x^(T) ⁰ and y^(T) ⁰ to predict z_(j) ^(T) ⁰ , w_(l) _(k) ^(T) ⁰ , and W_(l) _(g) ^(T) ⁰ respectively, and get the differences e_(j) ^(T) ⁰ , e_(l) _(k) ^(T) ⁰ and e_(l) _(g) ^(T) ⁰ between the predicted values and observed values. Let e_(i), e_(i,k), and e_(i,g) be the means of the sets of differences {e_(j) ^(T) ⁰ : j is a normal biologic sample}, {e_(l) _(k) ^(T) ⁰ : l_(k) is a biologic sample of the k^(th) abnormal type} and {e_(l) _(g) ^(T) ⁰ : l_(g) is a biologic sample of the g^(th) abnormal type} for CpG site i respectively. Identify the subset T_(k) of T₀ such that for any i∈T_(k), it has |e_(i)|<c_(2,0), |e_(i,k)|>c_(2,k), and |e_(i,k)−e_(i,g)|>c_(3,k), for some constants c_(2,0), c_(2,k), and c_(3,k), for all g≠k. T_(k) is the target set for the k^(th) type of the abnormal individual. These are the sites where the methylation of a normal biologic sample can be accurately predicted, the observed methylation in a biologic sample of the k^(th) abnormal type will deviate from the prediction, and deviation will be different from that of a biologic sample of any other abnormal type

Next, the system estimates the fraction of the new biologic samples to be tested. Recall that x^(c) and y^(c) are mean vectors of the methylation levels of the training first cell type and training second cell type data for the CpG sites in the reference set C. For any new biologic sample t to be tested, let z_(t) ^(C) be the observed methylation levels of CpG sites in C. Regress z_(t) ^(C) against x^(C) and y^(C), with the constraints that the intercept must be 0, and the coefficients must be non-negative and add to 1. The estimated coefficients for x^(C) are the estimated fractions of the two cell types for the biologic sample t.

The system can then test if the new biologic samples are from the k^(th) type of abnormal individual. For the new biologic sample (e.g., plasma) t, let x^(T) ^(k) and y^(T) ^(k) be mean vectors of the methylation levels of the training the first cell type data and the second cell type data for the CpG sites in the target set T_(k) identified in step 1 of this process 500, apply the fitted regression models obtained from the step 2 of this process 500 to x^(T) ^(k) and y^(T) ^(k) to predict the methylation levels of CpG sites in T_(k) for sample t under the hypothesis that sample t is from a normal. Let n_(k) be the number of CpG sites in T_(k). Define functions f_(k)(x₁, . . . , x_(n) _(k) )=Σ_(i)(−1)^(I_(e) ^(i,k) ^(−e) ^(i) ⁾x_(i) and f_(k,g)(x₁, . . . , x_(n) _(k) )=Σ_(i)(−1)^(I_(e) ^(i,k) ^(−e) ^(i,g) ⁾x_(i), where I_(⋅)=I_(−∞,0))(⋅), that is, the indicator function for the interval (−∞, 0), e_(i), e_(i,k) and e_(i,g) are estimations obtained from step 1.5 of the process 500. It will be said the sample is from an individual with the k^(th) type of abnormal phenotype if f_(k)(e_(1,t)−e₁, . . . , e_(n) _(k) _(,t)−e_(n) _(k) )>c_(4,k), and f_(k,g)(e_(1,t)−e_(1,g), . . . , e_(n) _(k) _(,t)−e_(n) _(k) _(,g))>c_(5,g) for all g≠k, where e_(i,t) is the difference between the observed methylation level of the CpG site i∈T_(k) for sample t and the predicted value by the fitted model obtained from step 2, and g≠k is any type of abnormal phenotype that is different form the k^(th) type of abnormal phenotype.

Other ways of implementing the process 500 can be developed by modifying the implementation presented above. Specifically, it does not need to assume that there are only two component reference methylomes that make up the biologic methylomes, nor does it need to estimate them directly. Instead, a set of predictor methylomes can be collected that are mixtures of component reference genomes, as long as the number of the predictor methylomes is the same as the number of the reference component methylomes, and the mixture vectors of the predictor methylomes are linearly independent. For example, they can be methylomes of biologic samples with known different proportion of first and second cell type DNAs.

In process 500, the difference between observed methylation levels in certain target regions and the predicted methylation levels as the test statistic to determine if in a biologic sample the methylome has been affect by some type of tissue abnormality. To illustrate the advantage of this approach, it is assumed that the mixture vector p_(j) for the methylome of a normal biologic sample j followed a Dirichlet's distribution with parameters α₁= . . . =α_(R). Furthermore, for CpG site i, its methylation levels in the R reference vector p_(j) for component methylomes are m_(i,r)=(r−1)/(R−1). It can be shown that the methylation level of i in sample j then has a mean of 0.5, and a variance of

$\frac{R + 1}{12\left( {R - 1} \right)\left( {{R\alpha_{1}} + 1} \right)}.$

If there is a methyl-seq library in sample j with a coverage of N for CpG site i, the variance of the measured methylation level z_(i,j) is

${\sigma_{1}^{2} = {\frac{1}{4N} + {\frac{N - 1}{N}\frac{R + 1}{12\left( {R - 1} \right)\left( {{R\alpha_{1}} + 1} \right)}}}}.$

In other words, if z_(i,j) is used as a test statistic to detect abnormal intestinal tissue using biologic sample, under the null hypothesis, the test statistic has a variance of σ₁ ². However, in process 500, it is first estimated the mixture vector p_(j), then predicted z_(i,j) by Σ_(r)m_(i,r) p_(r,j). Note that in a methyl-seq data, each library can cover millions of CpG sites, and that the variance of the coefficients in a linear regression model is inversely proportional to sample size. Thus it is possible to obtain highly accurate estimation of the mixture vector p_(i), even if it is taken into account that adjacent CpG sites tend to have correlated methylation levels. Assuming an accurate estimate of Σ_(r)m_(i,r) p_(r,j) can be obtained, that is, the error of the estimation can be ignored, the variance of the difference z_(i,j)−Σ_(r)M_(i,r) p_(r,j) between the observed methylation level and the prediction will be

$\frac{1}{4N} - {\frac{1}{N}{\frac{R + 1}{12\left( {R - 1} \right)\left( {{R\alpha_{1}} + 1} \right)}.}}$

In other words, under the null hypothesis, the test static z_(i,j)−Σ_(r) M_(i,r) p_(r,j) used in process 500 has a much smaller variance than the other candidate test statistic z_(i,j). This in turns means that the presently disclosed test will achieve a higher power at the same level of type I error.

Additional techniques for detecting, assessing, monitoring, or treating preeclampsia can include those set forth in U.S. Application Ser. No. 62/832,157, which is incorporated by reference in this disclosure. Additional techniques for detecting, assessing, monitoring, or treating CNS conditions can include those set forth in U.S. Application Ser. No. 62/882,215, which is incorporated by reference in this disclosure. Additional techniques for detecting, assessing, or monitoring fetal aneuploidy can include those set forth in U.S. Application Ser. No. 62/928,156, which is incorporated by reference in this disclosure. Additional techniques for detecting, assessing, monitoring, or treating ovarian cancer can include those set forth in U.S. Application Ser. No. 63/007,218, which is incorporated by reference in this disclosure. Additional techniques for detecting, assessing, monitoring, or treating endometriosis can include those set forth in U.S. Application Ser. No. 63/007,204, which is incorporated by reference in this disclosure. Additional techniques for detecting, assessing, monitoring, or treating necrotizing enterocolitis can include those set forth in U.S. Application Ser. No. 63/007,208, which is incorporated by reference in this disclosure.

FIG. 6 is a flowchart of an example process 600 for processing sequence data for nucleic acids from a biological sample to determine a methylation profile of the sample and classify a person with respect to a specified medical condition. The process 600 can be carried out in whole or in part by systems of one or more computers, e.g., systems 102, 110, or 120. It should also be appreciated that aspects of the process 600 may incorporate operations from process 500 of FIG. 5 as previously described.

The process 600 can include obtaining an initial set of sequence data (602). The initial sequence data describes sequences of nucleic acids from a biologic sample of a person or other mammal. In some examples, the nucleic acids characterized in the initial sequence data include cell-free DNA. The initial sequence data may include cell-free DNA from various tissues of the person, including tissues affected by a specified medical condition and tissues that are not affected, or are substantially less affected, by the specified medical condition. DNA fragments corresponding to the affected tissues may be deemed target DNA (e.g., DNA from the intestinal region when assessing necrotizing enterocolitis), while fragments corresponding to the other tissues may be deemed background or non-targeted DNA. The system receives an indication of the specified medical condition that is to be screened, e.g., based on user input provided into a computing terminal (604). The initial sequence data can be filtered using a selected filter corresponding to the specified medical condition (606). In some implementations, the filtering is operable to increase a fraction of sequences for target DNA relative to other DNA. In some implementations, the filtering includes selecting target nucleic acids from an initial set of nucleic acids based a methylation characteristic (e.g., methylation status), a copy number characteristic of the target nucleic acids, or both, and enriching the target nucleic acids, e.g., to increase a fraction of the target nucleic acids after filtering relative to their fraction in a pre-filtered set. A methylation profile can be generated from the filtered sequence data (608), and the methylation profile processed with an appropriate machine-learning model corresponding to the specified medical condition to generate a classification result (610). In some implementations, the machine-learning model is a model corresponding to those described with respect to the process 500 of FIG. 5. The classification result can indicate that the person's condition is normal and does not exhibit the disease or specified medical condition, or the classification result can indicate that the person's condition is abnormal and does exhibit the disease or specified medical condition. The classification result can be provided as output by the system in a number of ways (612). For example, the classification result may be presented to one or more users on an electronic display, presented audibly through an intelligent assistant device including a speaker, transmitted to a user as a message via one or more channels (e.g., email, text messages, private social media messages). In some implementations, the system may alert the patient, his or her healthcare provider, or both, of the classification result. The healthcare provider can use the classification result to diagnose the patient with the specified medical condition, inform a determination of how and whether to treat the patient for the specified medical condition, and/or inform a determination of whether further testing or assessment should be performed to screen the patient for the specified medical condition or another medical condition (614).

FIG. 7 is a flowchart of an example process 700 for training a machine-learning model to generate, from a methylation profile of a patient, a classification result indicating whether a specified medical condition of the patient is or is not normal. The system can be performed by a system of one or more computers, e.g., system 110 from the environment of FIG. 1. While process 700 relates to the training of certain types of models (e.g., feedforward neural networks) using a supervised learning technique, further description of training techniques is disclosed with respect to FIG. 5. The system obtains a set of training samples (702). Each training sample can include (i) a set of filtered or unfiltered sequence data describing sequences of nucleic acids from a biological sample of a person and (ii) a label indicating whether or not the patient exhibits a specified medical condition. The label can serve as a target output for the model 146 when evaluated on a methylation profile derived from the sequence data in the training sample. The machine-learning model is initialized (704). For example, the weights or other parameters of the model may be set to default or randomized values. An initial training sample is then selected (706), and the sequences from the selected training sample can be filtered to increase a fraction of sequences for target nucleic acids in the sequence (708). The system determines a methylation profile from the filtered sequence data (710), processes the methylation profile to generate an estimated classification (712), compares the estimated classification to the labeled/target classification to determine an error in the classification result (714), and then the error can be back-propagated through the model to update the model based on the error (e.g., using gradient descent) (716). The system checks if the training process is complete (718), e.g., by checking whether a training termination condition is satisfied. For example, training may terminate once all training samples have been consumed. The trained model can then be provided for use, e.g., by storing the model in association with a package for the corresponding medical condition (720).

FIG. 8 shows an example of a computing device 800 and a mobile computing device 850 that can be used in some embodiments to implement the techniques described herein. The computing device 800 is intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The mobile computing device is intended to represent various forms of mobile devices, such as personal digital assistants, cellular telephones, smart-phones, and other similar computing devices. The components shown here, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed in this document.

The computing device 800 includes a processor 802, a memory 804, a storage device 806, a high-speed interface 808 connecting to the memory 804 and multiple high-speed expansion ports 810, and a low-speed interface 812 connecting to a low-speed expansion port 814 and the storage device 806. Each of the processor 802, the memory 804, the storage device 806, the high-speed interface 808, the high-speed expansion ports 810, and the low-speed interface 812, are interconnected using various busses, and may be mounted on a common motherboard or in other manners as appropriate. The processor 802 can process instructions for execution within the computing device 800, including instructions stored in the memory 804 or on the storage device 806 to display graphical information for a GUI on an external input/output device, such as a display 816 coupled to the high-speed interface 808. In other implementations, multiple processors and/or multiple buses may be used, as appropriate, along with multiple memories and types of memory. Also, multiple computing devices may be connected, with each device providing portions of the necessary operations (e.g., as a server bank, a group of blade servers, or a multi-processor system).

The memory 804 stores information within the computing device 800. In some implementations, the memory 804 is a volatile memory unit or units. In some implementations, the memory 804 is a non-volatile memory unit or units. The memory 804 may also be another form of computer-readable medium, such as a magnetic or optical disk.

The storage device 806 is capable of providing mass storage for the computing device 800. In some implementations, the storage device 806 may be or contain a computer-readable medium, such as a floppy disk device, a hard disk device, an optical disk device, or a tape device, a flash memory or other similar solid state memory device, or an array of devices, including devices in a storage area network or other configurations. The computer program product may also contain instructions that, when executed, perform one or more methods, such as those described above. The computer program product can also be tangibly embodied in a computer- or machine-readable medium, such as the memory 804, the storage device 806, or memory on the processor 802.

The high-speed interface 808 manages bandwidth-intensive operations for the computing device 800, while the low-speed interface 812 manages lower bandwidth-intensive operations. Such allocation of functions is exemplary only. In some implementations, the high-speed interface 808 is coupled to the memory 804, the display 816 (e.g., through a graphics processor or accelerator), and to the high-speed expansion ports 810, which may accept various expansion cards (not shown). In the implementation, the low-speed interface 812 is coupled to the storage device 806 and the low-speed expansion port 814. The low-speed expansion port 814, which may include various communication ports (e.g., USB, Bluetooth, Ethernet, wireless Ethernet) may be coupled to one or more input/output devices, such as a keyboard, a pointing device, a scanner, or a networking device such as a switch or router, e.g., through a network adapter.

The computing device 800 may be implemented in a number of different forms, as shown in the figure. For example, it may be implemented as a standard server 820, or multiple times in a group of such servers. In addition, it may be implemented in a personal computer such as a laptop computer 822. It may also be implemented as part of a rack server system 824. Alternatively, components from the computing device 800 may be combined with other components in a mobile device (not shown), such as a mobile computing device 850. Each of such devices may contain one or more of the computing device 800 and the mobile computing device 850, and an entire system may be made up of multiple computing devices communicating with each other.

The mobile computing device 850 includes a processor 852, a memory 864, an input/output device such as a display 854, a communication interface 866, and a transceiver 868, among other components. The mobile computing device 850 may also be provided with a storage device, such as a micro-drive or other device, to provide additional storage. Each of the processor 852, the memory 864, the display 854, the communication interface 866, and the transceiver 868, are interconnected using various buses, and several of the components may be mounted on a common motherboard or in other manners as appropriate.

The processor 852 can execute instructions within the mobile computing device 850, including instructions stored in the memory 864. The processor 852 may be implemented as a chipset of chips that include separate and multiple analog and digital processors. The processor 852 may provide, for example, for coordination of the other components of the mobile computing device 850, such as control of user interfaces, applications run by the mobile computing device 850, and wireless communication by the mobile computing device 850.

The processor 852 may communicate with a user through a control interface 858 and a display interface 856 coupled to the display 854. The display 854 may be, for example, a TFT (Thin-Film-Transistor Liquid Crystal Display) display or an OLED (Organic Light Emitting Diode) display, or other appropriate display technology. The display interface 856 may comprise appropriate circuitry for driving the display 854 to present graphical and other information to a user. The control interface 858 may receive commands from a user and convert them for submission to the processor 852. In addition, an external interface 862 may provide communication with the processor 852, so as to enable near area communication of the mobile computing device 850 with other devices. The external interface 862 may provide, for example, for wired communication in some implementations, or for wireless communication in other implementations, and multiple interfaces may also be used.

The memory 864 stores information within the mobile computing device 850. The memory 864 can be implemented as one or more of a computer-readable medium or media, a volatile memory unit or units, or a non-volatile memory unit or units. An expansion memory 874 may also be provided and connected to the mobile computing device 850 through an expansion interface 872, which may include, for example, a SIMM (Single In Line Memory Module) card interface. The expansion memory 874 may provide extra storage space for the mobile computing device 850, or may also store applications or other information for the mobile computing device 850. Specifically, the expansion memory 874 may include instructions to carry out or supplement the processes described above, and may include secure information also. Thus, for example, the expansion memory 874 may be provided as a security module for the mobile computing device 850, and may be programmed with instructions that permit secure use of the mobile computing device 850. In addition, secure applications may be provided via the SIMM cards, along with additional information, such as placing identifying information on the SIMM card in a non-hackable manner.

The memory may include, for example, flash memory and/or NVRAM memory (non-volatile random access memory), as discussed below. The computer program product contains instructions that, when executed, perform one or more methods, such as those described above. The computer program product can be a computer- or machine-readable medium, such as the memory 864, the expansion memory 874, or memory on the processor 852. In some implementations, the computer program product can be received in a propagated signal, for example, over the transceiver 868 or the external interface 862.

The mobile computing device 850 may communicate wirelessly through the communication interface 866, which may include digital signal processing circuitry where necessary. The communication interface 866 may provide for communications under various modes or protocols, such as GSM voice calls (Global System for Mobile communications), SMS (Short Message Service), EMS (Enhanced Messaging Service), or MMS messaging (Multimedia Messaging Service), CDMA (code division multiple access), TDMA (time division multiple access), PDC (Personal Digital Cellular), WCDMA (Wideband Code Division Multiple Access), CDMA2000, or GPRS (General Packet Radio Service), among others. Such communication may occur, for example, through the transceiver 868 using a radio-frequency. In addition, short-range communication may occur, such as using a Bluetooth, WiFi, or other such transceiver (not shown). In addition, a GPS (Global Positioning System) receiver module 870 may provide additional navigation- and location-related wireless data to the mobile computing device 850, which may be used as appropriate by applications running on the mobile computing device 850.

The mobile computing device 850 may also communicate audibly using an audio codec 860, which may receive spoken information from a user and convert it to usable digital information. The audio codec 860 may likewise generate audible sound for a user, such as through a speaker, e.g., in a handset of the mobile computing device 850. Such sound may include sound from voice telephone calls, may include recorded sound (e.g., voice messages, music files, etc.) and may also include sound generated by applications operating on the mobile computing device 850.

The mobile computing device 850 may be implemented in a number of different forms, as shown in the figure. For example, it may be implemented as a cellular telephone 880. It may also be implemented as part of a smart-phone 882, personal digital assistant, or other similar mobile device.

Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, specially designed ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various implementations can include implementation in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, coupled to receive data and instructions from, and to transmit data and instructions to, a storage system, at least one input device, and at least one output device.

These computer programs (also known as programs, software, software applications or code) include machine instructions for a programmable processor, and can be implemented in a high-level procedural and/or object-oriented programming language, and/or in assembly/machine language. As used herein, the terms machine-readable medium and computer-readable medium refer to any computer program product, apparatus and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term machine-readable signal refers to any signal used to provide machine instructions and/or data to a programmable processor.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to the user and a keyboard and a pointing device (e.g., a mouse or a trackball) by which the user can provide input to the computer. Other kinds of devices can be used to provide for interaction with a user as well; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user can be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front end component (e.g., a client computer having a graphical user interface or a Web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back end, middleware, or front end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include a local area network (LAN), a wide area network (WAN), and the Internet.

The computing system can include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

Although various implementations have been described in detail above, other modifications are possible. In addition, the logic flows depicted in the figures do not require the particular order shown, or sequential order, to achieve desirable results. In addition, other steps may be provided, or steps may be eliminated, from the described flows, and other components may be added to, or removed from, the described systems. Accordingly, other implementations are within the scope of the following claims.

OTHER EMBODIMENTS AND EXAMPLES Example 1—Methods of Assessing DNA Methylation for Pregnancy Phenotyping and Disease Diagnosis

Introduction of this Example

The example relates to methods for diagnosing, prognosing, monitoring and/or treating pregnancy-associated disorders in a pregnant subject.

Background of this Example

Early detection of pregnancy-associated conditions and prenatal disorders, including potential complications during pregnancy or delivery, is crucial, as it allows early medical intervention necessary for the safety of both the mother and the fetus. For example, the pregnancy-associated condition preeclampsia affects 2%-8% of all pregnancies and contributes to 15% of preterm deliveries and between 9% and 26% of maternal deaths worldwide. A number of risk factors for preeclampsia have been identified including hypertension and diabetes, and the consequences of preeclampsia can be far reaching and can include an elevated lifetime risk of cardiovascular disease both in the mother and the infant.

Prenatal diagnosis is typically reliant on invasive procedures such as chorionic villus sampling and amniocentesis. However, such early gestational placental biopsies for prediction of complex gestational diseases are not feasible due to the costly and invasive nature of such procedures. In addition, these invasive procedures are associated with higher risks of spontaneous abortion or fetal death, procedure-induced limb defects and oromandibular hypogenesis, fetomaternal hemorrhage, persistent leakage of amniotic fluid and vertical transmission of infection, and can also cause maternal discomfort and anxiety. With regard to preeclampsia, the only reliable predictor is a previous occurrence.

Alternatives to such invasive approaches have been developed for prenatal screening following the discovery that plasma from pregnant women contains significant numbers of genome equivalents that are derived from the fetoplacental unit. It has also been demonstrated that fetoplacental RNAs are detectable in maternal plasma and that these can be targeted for non-invasive early gestational phenotyping (Koh et al., Proc Natl Acad Sci USA. 2014; 111:7361-7366 and Ngo et al., Science. 2018; 360:1133-1136). However, very little is understood regarding the potential of DNA methylation in maternal plasma to provide non-invasive diagnostic and phenotypic information during pregnancy. This is significant because numerous studies have identified altered DNA methylation in the placentas of mothers affected by complex gestational diseases (Lapaire et al., Fetal Diagn Ther. 2012; 31:147-153; Winn et al., Pregnancy Hypertens. 2011; 1:100-108; Kang et al., J Hypertens. 2011; 29:928-936; Chaouat et al., J Reprod Immunol. 2011; 89:163-172; Varkonyi et al., Placenta. 2011; 32 Suppl:S21-29; Yuen et al., Eur J Hum Genet. 2010; 18:1006-1012; Blair et al., Mol Hum Reprod. 2013; 19:697-708; and Chu et al., PLoS One. 2014; 9:e107318). However, it is not known to what extent the maternal plasma provides a window into the fetoplacental DNA methylome and therefore little progress has been made with respect to whether this approach has value for diagnosis and phenotyping of the fetoplacental unit during early gestation. Therefore, there is a need for non-invasive methods for diagnosis of pregnancy-associated conditions, e.g., preeclampsia, and prenatal conditions.

Summary of this Example

This Example relates to methods for diagnosing, prognosing, monitoring and/or treating pregnancy-associated disorders, e.g., preeclampsia, in a pregnant subject. It is based, at least in part, on the discovery that DNA is differentially methylated in blood samples from pregnant women with preeclampsia as compared to pregnant women that do not have preeclampsia. This Example further provides algorithms and kits for diagnosing, prognosing, monitoring, classifying and/or treating pregnancy-associated disorders.

In certain embodiments, a method for diagnosing, prognosing, classifying and/or monitoring a pregnancy-associated disorder in a pregnant subject can include obtaining a biological sample from the subject, determining the methylation status and/or level of one or more genomic loci in the biological sample, comparing the methylation status and/or level of the one or more genomic loci to a reference and diagnosing a pregnancy-associated disorder in the subject. In certain embodiments, a difference in the methylation status and/or level of the one or more genomic loci in the biological sample compared to the reference indicates the presence of the pregnancy-associated disorder in the subject. In certain embodiments, the reference is the methylation status and/or level of the one or more genomic loci in a biological sample obtained from a pregnant subject that does not have the pregnancy-associated disorder. In certain embodiments, the reference is the methylation status and/or level of the one or more genomic loci in a biological sample obtained from the pregnant subject being tested for the pregnancy-associated disorder. In certain embodiments, the reference is the methylation status and/or level of the one or more genomic loci in a biological sample obtained from a non-pregnant subject. In certain embodiments, the pregnancy-associated disorder is preeclampsia, preterm labor, preterm birth, hyperemesis gravidarum, ectopic pregnancy or intrauterine growth retardation. For example, but not by way of limitation, the pregnancy-associated disorder is preeclampsia. In certain embodiments, the pregnancy-associated disorder is preterm birth.

This Example further provides methods for diagnosing, prognosing and/or monitoring a pregnancy-associated disorder, e.g., preeclampsia in a pregnant subject. For example, but not by way of limitation, the methods can include obtaining a biological sample from the subject, determining the methylation status and/or level of one or more genomic loci in the biological sample, comparing the methylation status and/or level of the one or more genomic loci to a reference and diagnosing the pregnancy-associated disorder, e.g., preeclampsia, in the subject. In certain embodiments, a difference in the methylation status and/or level of the one or more genomic loci in the biological sample compared to the reference indicates the pregnancy-associated disorder, e.g., preeclampsia, in the subject. In certain embodiments, the reference is the methylation status and/or level of the one or more genomic loci in a biological sample obtained from a pregnant subject that does not have the pregnancy-associated disorder, e.g., preeclampsia. In certain embodiments, the reference is the methylation status and/or level of the one or more genomic loci in a biological sample obtained from the pregnant subject being tested for the pregnancy-associated disorder, e.g., preeclampsia. In certain embodiments, the reference is the methylation status and/or level of the one or more genomic loci in a biological sample obtained from a non-pregnant subject.

This Example further provides methods for determining if a pregnant subject has an increased risk of having a preterm birth. In certain embodiments, the method includes obtaining a biological sample from the subject, determining the methylation status and/or level of one or more genomic loci in the biological sample, comparing the methylation status and/or level of the one or more genomic loci to a reference and determining that the subject is at an increased risk of preterm birth. In certain embodiments, a difference in the methylation status and/or level of the one or more genomic loci in the biological sample compared to the reference indicates that the subject is at an increased risk of preterm birth.

In certain embodiments, a method for diagnosing, prognosing and/or monitoring a pregnancy-associated disorder in a pregnant subject can include obtaining a biological sample from the subject, determining the fraction of fetal nucleic acid (fetal fraction) in the biological sample, determining the methylation status of one or more genomic loci in placental nucleic acids present in the biological sample and diagnosing the subject with the pregnancy-associated disorder by analyzing the fetal fraction and the methylation status of the genomic loci in the placental nucleic acid. In certain embodiments, the methylation status is the methylation rate of the one or more genomic loci. In certain embodiments, the fetal fraction is determined by: analyzing the methylation status of one or more reference genomic loci in the biological sample, analyzing the methylation status of the one or more reference genomic loci in a reference sample of maternal blood cells and analyzing the methylation status of the one or more reference genomic loci in a reference sample of placental nucleic acids. In certain embodiments, the methylation status of the one or more genomic loci is determined by: analyzing the methylation status of the one or more genomic loci in the biological sample and analyzing the methylation status of the one or more genomic loci in a reference sample of maternal blood cells. In certain embodiments, the methylation status of the one or more genomic loci is determined by: analyzing the methylation status of the one or more genomic loci in the biological sample and analyzing the methylation status of the one or more genomic loci in a reference sample of blood cells from a non-pregnant individual. In certain embodiments, the methylation status of the one or more genomic loci is determined by: analyzing the methylation status of the one or more genomic loci in the biological sample and analyzing the methylation status of the one or more genomic loci in a reference sample of plasma from a non-pregnant individual.

This Example disclosure also provides for methods of treating a pregnancy-associated disorder in a pregnant subject. In certain embodiments, the method can include obtaining a biological sample from the subject, determining the methylation status and/or level of one or more genomic loci present in the biological sample, comparing the methylation status and/or level of the one or more genomic loci to a reference, diagnosing a pregnancy-associated disorder in the subject, wherein the difference in the methylation status and/or level of the one or more genomic loci in the biological sample compared to the reference indicates the presence of the pregnancy-associated disorder in the subject, and treating the subject diagnosed with the pregnancy-associated disorder. In certain embodiments, a method of treating a pregnancy-associated disorder in a pregnant subject can include diagnosing a pregnancy-associated disorder in the subject by utilization of the algorithm disclosed in Example Embodiment B and treating the subject diagnosed with the pregnancy-associated disorder. In certain embodiments, the pregnancy-associated disorder preeclampsia, preterm labor, hyperemesis gravidarum, ectopic pregnancy or intrauterine growth retardation. For example, but not by way of limitation, the pregnancy-associated disorder is preeclampsia. In certain embodiments, the pregnancy-associated disorder is preterm birth. In certain embodiments, the method of treating preeclampsia can include any method known in the art, e.g., can include one or more of the following treatments: administration of an anti-hypertensive medication, administration of HMG-CoA reductase inhibitors, delivery, administration of a corticosteroid and/or administration of an anti-convulsant medication.

In certain embodiments, an increase in the level of methylation of the one or more genomic loci in the biological sample indicates the presence of the pregnancy-associated disorder or preeclampsia in the subject. In certain embodiments, a decrease in the level of methylation of the one or more genomic loci in the biological sample indicates the presence of the pregnancy-associated disorder or preeclampsia in the subject. Alternatively and/or additionally, a decrease in the level of methylation of at least one of the one or more genomic loci in the biological sample and the increase in the level of methylation of at least one of the one or more different genomic loci in the biological sample indicates the presence of the pregnancy-associated disorder or preeclampsia in the subject.

In certain embodiments, the subject is human. In certain embodiments, the biological sample is a blood sample, stool sample, saliva sample and/or urine sample obtained from the subject. For example, but not by way of limitation, the biological sample can be obtained from the pregnant subject anytime during the pregnancy but prior to onset of clinical systems. In certain embodiments, the biological sample can be obtained from the pregnant subject between week 10 and week 13 of gestation, e.g., for early diagnosis of preeclampsia. In certain embodiments, the one or more genomic loci are present within maternal nucleic acids isolated from the biological sample, e.g., the maternal nucleic acids are obtained from cells, e.g., leukocytes, in the biological sample or are cell-free nucleic acids, e.g., placental nucleic acids, in the biological sample. Alternatively and/or additionally, the one or more genomic loci are present within fetal nucleic acids isolated from the biological sample, e.g., the fetal nucleic acids are cell-free nucleic acids in the biological sample. In certain embodiments, the one or more genomic loci comprise one or more CpG sites.

This Example provides for algorithms for diagnosing and/or monitoring a subject with a pregnancy-associated disorder. In certain embodiments, the algorithm can be used to classify a pregnancy-associated disorder of a subject.

This Example provides for kits for diagnosing, monitoring, classifying and/or treating a subject with a pregnancy-associated disorder. In certain embodiments, a kit of this Example includes a means for determining and/or detecting the methylation status of one or more genomic loci. In certain embodiments, the kit can further include instructions for diagnosing, monitoring and/or treating a subject with a pregnancy-associated disorder, e.g., preeclampsia.

Description of this Example

This Example provides methods for diagnosing, prognosing, monitoring, classifying and/or treating pregnancy-associated disorders, e.g., preeclampsia, in a pregnant subject. It is based, at least in part, on the discovery that DNA is differentially methylated in blood samples from pregnant women with preeclampsia as compared to pregnant women that do not have preeclampsia. For example, but not by way of limitation, the methods disclosed herein include determining the methylation status of one or more genomic loci in a biological sample of a pregnant subject. In certain embodiments, the methods disclosed herein include the use of an algorithm to diagnose, prognose, monitor, classifying and/or treat pregnancy-associated disorders, e.g., preeclampsia, in a pregnant subject.

Definitions of this Example

Unless defined otherwise, all technical and scientific terms used in this Example have the meaning commonly understood by a person skilled in the art to which this invention belongs. The following references provide one of skill with a general definition of many of the terms used in this invention: Singleton et al., Dictionary of Microbiology and Molecular Biology (2nd ed. 1994); The Cambridge Dictionary of Science and Technology (Walker ed., 1988); The Glossary of Genetics, 5th Ed., R. Rieger et al. (eds.), Springer Verlag (1991); and Hale & Marham, The Harper Collins Dictionary of Biology (1991). As used herein, the following terms have the meanings ascribed to them below, unless specified otherwise.

The terms “comprise(s),” “include(s),” “having,” “has,” “can,” “contain(s),” and variants thereof, as used herein, are intended to be open-ended transitional phrases, terms or words that do not preclude the possibility of additional acts or structures. The singular forms “a,” “an” and “the” include plural references unless the context clearly dictates otherwise. This Example also contemplates other embodiments “comprising,” “consisting of” and “consisting essentially of,” the embodiments or elements presented herein, whether explicitly set forth or not.

As used herein, the term “about” or “approximately” means within an acceptable error range for the particular value as determined by one of ordinary skill in the art, which will depend in part on how the value is measured or determined, i.e., the limitations of the measurement system. For example, “about” can mean within 3 or more than 3 standard deviations, per the practice in the art. Alternatively, “about” can mean a range of up to 20%, preferably up to 10%, more preferably up to 5%, and more preferably still up to 1% of a given value. Alternatively, particularly with respect to biological systems or processes, the term can mean within an order of magnitude, preferably within 5-fold, and more preferably within 2-fold, of a value.

As used herein, the term “biomarker” refers to a marker (e.g., DNA methylation status) that allows detection of a disease and/or disorder in an individual, including detection of disease in its early stages. Early stage of a disease, as used herein, refers to the time period between the onset of the disease and the time point that signs or symptoms of the disease emerge. In certain non-limiting embodiments, the presence, absence and/or level of a biomarker in a biological sample of a subject is compared to a reference control.

The terms “reference sample,” “reference control,” “control” or “reference,” as used interchangeably herein, refers to a control for a methylation status of a genomic locus that is to be detected in a biological sample of a subject. In certain embodiments, a reference sample can be a sample from a healthy pregnant individual, e.g., an individual that does not have a pregnancy-associated disorder. In certain embodiments, a reference sample can be a sample from an individual that is not pregnant. In certain embodiments, a reference sample can be a sample from an individual that does not have preeclampsia. In certain embodiments, the reference sample can be an earlier sample taken from the same subject, e.g., while they were not pregnant or during an earlier healthy pregnancy. In certain embodiments, a control or reference can be the presence, absence and/or particular level of a methylation state of a genomic locus in a healthy pregnant individual. In certain embodiments, a control can be the presence, absence and/or particular level of a methylation state of a genomic locus in a healthy individual that underwent treatment for a pregnancy-associated disorder, wherein the healthy individual is non-symptomatic. In certain embodiments, a reference can be the presence, absence and/or particular level of a methylation state of a genomic locus in a healthy individual that has never had the disease. In certain embodiments, the reference can be a predetermined presence, absence and/or particular level of a methylation state of a genomic locus that indicates a subject does not have preeclampsia.

The term “pregnancy-associated disorder,” as used herein, refers to any condition or disease that may affect a pregnant woman or both the woman and the fetus. Such a condition or disease may manifest its symptoms during a limited time period, e.g., during pregnancy or delivery, or may last the entire life span of the fetus following its birth. Non-limiting examples of pregnancy-associated disorders include preeclampsia, preterm labor, preterm birth, hyperemesis gravidarum, ectopic pregnancy, intrauterine growth retardation and genomic abnormalities (e.g., aneuploidy and chromosomal abnormalities such as trisomy 21, trisomy, 18, trisomy 13 and extra or missing copies of the X chromosome and Y chromosome). In certain embodiments, the pregnancy-associated disorder is preeclampsia. In certain embodiments, the pregnancy-associated disorder is preterm birth.

As used herein, “preeclampsia” refers to a pregnancy-associated disorder that manifests as new onset hypertension. For example, but not by way of limitation, a subject suffering from preeclampsia can has a systolic blood pressure greater or equal to 140 mmHg and a diastolic pressure greater than or equal to 90 mmHg and protein in the urine at a concentration greater than 300 mg/dL/24 hr after 20 weeks gestation (i.e., arterial pressure >140/90 mmHg and proteinuria >300 mg/dL/24 hours after 20 weeks gestation). See, e.g., Chu et al., PLoS One. 2014; 9:e107318.

The term “slightly invasive or non-invasive method” refers to a method that does not involve the removal of tissues or fetal cells by biopsy and/or effraction from the placental barrier. In certain embodiments, slightly invasive or non-invasive methods, as disclosed herein, includes the extraction of a biological sample from a subject by venipuncture.

The term “patient” or “subject,” as used interchangeably herein, refers to any pregnant warm-blooded animal, e.g., human or non-human. Non-limiting examples of non-human subjects include non-human primates, dogs, cats, mice, rats, guinea pigs, rabbits, fowl, pigs, horses, cows, goats, sheep, etc. In certain embodiments, the subject is human.

As used herein, the term “biological sample” refers to a sample of biological material obtained from a pregnant subject. In certain embodiments, a sample of biological material obtained from a pregnant subject a pregnant human subject, including a biological fluid. Non-limiting examples of a biological fluid include urine, amniotic fluid, saliva, tears, sweat, blood, plasma and serum. In certain embodiments, the biological sample is a peripheral blood sample from a pregnant subject. In certain embodiments, the blood sample can be a fractionated portion of peripheral blood, such as a plasma sample. In certain embodiments, the biological sample can be a tissue sample obtained from the fetoplacental unit, e.g., as obtained by chorionic villus sampling and/or amniocentesis. In certain embodiments, the biological sample can be maternal leukocytes obtained from a blood sample. In certain embodiments, the biological sample can be circulating fetal cells obtained from a maternal blood sample. In certain embodiments, the biological sample can be a stool sample, a sample from the embryo and/or a sample from the blastocyst.

The term “nucleic acid,” “nucleic acid molecule” or “polynucleotide” includes any compound and/or substance that comprises a polymer of nucleotides. Each nucleotide is composed of a base, specifically a purine- or pyrimidine base (i.e., cytosine (C), guanine (G), adenine (A), thymine (T) or uracil (U)), a sugar (i.e., deoxyribose or ribose) and a phosphate group. In certain embodiments, the nucleic acid molecule is described by the sequence of bases, whereby said bases represent the primary structure (linear structure) of a nucleic acid molecule. The sequence of bases is typically represented from 5′ to 3′. These terms encompass deoxyribonucleic acid (DNA) including, e.g., complementary DNA (cDNA) and genomic DNA, ribonucleic acid (RNA), in particular messenger RNA (mRNA), synthetic forms of DNA or RNA, and mixed polymers comprising two or more of these molecules. The herein described nucleic acid molecule can contain naturally occurring or non-naturally occurring nucleotides. Examples of non-naturally occurring nucleotides include modified nucleotide bases with derivatized sugars or phosphate backbone linkages or chemically modified residues.

The term “isolated” (e.g., isolated genomic DNA) refers to a biological component that has been substantially separated or purified away from other biological components in the cell of the organism in which the component naturally occurs, e.g., other chromosomal and extra-chromosomal DNA and RNA, proteins and organelles. Nucleic acids, e.g., DNA, that have been “isolated” include nucleic acids purified by standard purification methods.

The term “genomic locus” or “genomic DNA locus,” as used herein, refers to any fixed position in a genome. For example, but not by way of limitation, a genomic locus can refer to a genomic element, a chromosomal region, a gene, a region of a gene, e.g., an exon or intron, a regulatory region of a gene, e.g., a promoter or enhancer, a CpG site, a CpG island or a CpG island shore. For example, but not by way of limitation, a genomic locus can include one or more CpG sites, e.g., between about 1 to about 100 CpG sites. In certain embodiments, a genomic locus can be of any particular length, e.g., between about 1 to about 10,000 nucleotides in length.

As used interchangeably herein, “methylation state,” “methylation profile,” “methylation status” and “methylation level” refer to the presence, absence, percentage and/or quantity of methylation at a particular nucleotide, or nucleotides, within a DNA region, e.g., a genomic locus. The methylation status of a particular DNA sequence (e.g., a genomic locus) can indicate the methylation state of every nucleotide in the sequence, indicate the methylation state of any of the nucleotides (e.g., cytosines) in the sequence, can indicate the methylation state of a subset of the nucleotides (e.g., of cytosines), can indicate the percentage or fraction of methylated cytosines at any particular stretch of nucleotides within the sequence or can indicate the average rate of methylation of all the cytosines (or a subset of the cytosines) present in a nucleic acid.

As used herein, a “methylated nucleic acid molecule” refers to a nucleic acid molecule that contains one or more methylated nucleotides that is/are methylated.

As used herein, a “CpG site” or “methylation site” is a nucleotide within a nucleic acid that is susceptible to methylation either by natural occurring events in vivo or by an event instituted to chemically methylate the nucleotide in vitro.

A “CpG island,” as used herein, describes a segment of a nucleic acid, e.g., DNA sequence, that have a high frequency of CpG dinucleotide repeats. See, e.g., Illingworth and Bird, FEBS Letters 2009; 583:1713-1720. For example, but not by way of limitation, Yamada et al. (Genome Research 2004; 14:247-266) have described a set of standards for determining a CpG island: it must be at least 400 nucleotides in length, has a GC content greater than 50% and an OCF/ECF ratio greater than 0.6. Others (Takai et al., Proc. Natl. Acad. Sci. U.S.A. 2002; 99:3740-3745) have defined a CpG island less stringently as a sequence of at least 200 nucleotides in length, having a greater than 50% GC content and an OCF/ECF ratio greater than 0.6.

A “CpG island shore,” as used herein, refers to methylation hotspots that are present a short distance, e.g., less than 2 kb, from CpG islands.

The term “methylome,” as used herein, refers to the amount or pattern of methylation at different sites or regions within a population of cells. The methylome can correspond to all of the genome, a subset of the genome (e.g., repeat elements in the genome) or a portion of the subset (e.g., those areas found to be associated with a pregnancy-associated disorder). A “fetal methylome” corresponds to a methylome of a fetus of a pregnant female. The fetal methylome can be determined using a variety of fetal tissues or sources of fetal DNA, including placental tissues and cell-free fetal DNA in maternal plasma. A methylome from plasma can be referred to a “plasma methylome.” The plasma methylome is an example of a cell-free methylome since plasma and serum include cell-free DNA (cfDNA). The plasma methylome is also an example of a mixed population methylome because it is a mixture of the fetal/maternal methylome. A “maternal methylome” corresponds to a methylome of a pregnant female. The maternal methylome can be determined using a variety of material tissues or sources of maternal DNA, including cell-free maternal DNA in maternal plasma and DNA from leukocytes.

As used herein, the term “increase” refers to alter positively by at least about 2%, including, but not limited to, alter positively by about 5%, by about 10%, by about 15%, by about 20%, by about 25%, by about 30%, by about 35%, by about 40%, by about 45%, by about 50%, by about 55%, by about 60%, by about 65%, by about 70%, by about 75%, by about 80%, by about 85%, by about 90%, by about 95% or by about 100%.

As used herein, the terms “reduce,” “reduction” or “decrease” refers to alter negatively by at least about 2%, including, but not limited to, alter negatively by about 5%, by about 10%, by about 15%, by about 20%, by about 25%, by about 30%, by about 35%, by about 40%, by about 45%, by about 50%, by about 55%, by about 60%, by about 65%, by about 70%, by about 75%, by about 80%, by about 85%, by about 90%, by about 95% or by about 100%.

Methods of this Example

This Example provides methods for diagnosing, monitoring, classifying and/or treating pregnancy-associated disorders in a pregnant subject, e.g., by analyzing the methylation status of one or more genomic loci in a biological sample of a pregnant subject. In certain embodiments, the methods can utilize the algorithm disclosed herein. For example, but not by way of limitation, methods of this Example can be used to diagnose, monitor and/or treat pregnancy-associated disorders including, but not limited to, preeclampsia, preterm labor, preterm birth, hyperemesis gravidarum, ectopic pregnancy and/or intrauterine growth retardation. In certain embodiments, methods this Example allow the early diagnosis of a pregnant subject with preeclampsia, e.g., within the first trimester.

In certain embodiments, the biological sample can be a blood sample (including plasma or serum), stool sample, saliva sample and/or urine sample. In certain embodiments, the biological sample can be obtained from an embryo or blastocyst, e.g., during an in vitro fertilization procedure. The step of collecting a biological sample can be carried out either directly or indirectly by any suitable technique. For example, and not by way of limitation, a blood sample from a subject can be carried out by phlebotomy or any other suitable technique, with the blood sample processed further to provide a serum sample or other suitable blood fraction for analysis.

In certain embodiments, the biological sample, e.g., blood, is extracted from a pregnant subject at any time during the pregnancy. For example, but not by way of limitation, the biological sample can be obtained during the first trimester, second trimester or third trimester of the subject's pregnancy. In certain embodiments, the biological sample can be obtained any time during the pregnancy before the onset of pregnancy-associated disorder and/or before the onset of symptoms of the pregnancy-associated disorder. In certain embodiments, the biological sample is obtained during the first trimester of a subject's pregnancy for early diagnosis of the subject with a pregnancy-associated disorder, e.g., preeclampsia. In certain embodiments, a biological sample can be extracted from a pregnant subject between the 10th and 34th week of the pregnancy. In certain embodiments, the biological sample is obtained before the 34th week of pregnancy, before the 33rd week of pregnancy, before the 32nd week of pregnancy, before the 31st week of pregnancy, before the 30th week of pregnancy, before the 29th week of pregnancy, before the 28th week of pregnancy, before the 27th week of pregnancy, before the 26th week of pregnancy, before the 25th week of pregnancy, before the 24th week of pregnancy, before the 23rd week of pregnancy, before the 22nd week of pregnancy, before the 20th week of pregnancy, before the 21st week of pregnancy, before the 20th week of pregnancy, before the 19th week of pregnancy, before the 18th week of pregnancy, before the 17th week of pregnancy, before the 16th week of pregnancy, before the 15th week of pregnancy, before the 14th week of pregnancy, before the 13th week of pregnancy, before the 12th week of pregnancy, before the 11th week of pregnancy or before the 10th week of pregnancy. For example, but not by way of limitation, the biological sample can be obtained from a pregnant subject between about week 10 and about week 13 of pregnancy, e.g., to diagnose a subject with preeclampsia.

In certain embodiments, multiple biological samples (e.g., two or more, three or more, four or more, five or more, six or more or seven or more biological samples) can be obtained during a subject's pregnancy (e.g., serially obtained samples). For example, but not by way of limitation, multiple biological samples can be obtained during the first trimester. In certain embodiments, one or more samples can be obtained during the first trimester and one or more samples can be obtained during the second or third trimester.

Diagnostic, Prognostic, Classification and Monitoring Methods of this Example

This Example provides diagnostic and prognostic methods for diseases and/or disorders that are characterized by differential methylation of genomic loci. For example, but not by way of limitation, this Example provides methods for diagnosing, prognosing, classifying and/or monitoring a pregnancy-associated disorder in a subject that includes analyzing the methylation status of certain genomic loci. In certain embodiments, the pregnancy-associated disorder is preeclampsia. In certain embodiments, the pregnancy-associated disorder is preterm birth.

In certain embodiments, the analyzed genomic loci can include one or more genomic loci that exhibit differential methylation in a biological sample from a subject that has a pregnancy-associated disorder compared to a reference sample. For example, but not by way of limitation, the methods of this Example include assessing the methylation status of one or more genomic loci, e.g., about 5 or more, about 10 or more, about 50 or more, about 100 or more, about 500 or more, about 1,000 or more, about 5,000 or more, about 10,000 or more, about 25,000 or more, about 50,000 or more or about 100,000 or more genomic loci in a biological sample of a subject. In certain embodiments, the genomic loci can be selected from the genes, or a region within the genes, provided in FIGS. 15A-15B.

In certain embodiments, the one or more genomic loci can be one or more promoter regions of one or more genes, one or more exons of one or more genes, one or more introns of one or more genes, one or more CpG sites, one or more CpG islands, one or more CpG island shores, one or more enhancers of one or more genes or a combination thereof. In certain embodiments, the genomic loci are present on a particular chromosome. For example, but not by way of limitation, the genomic loci are present on chromosome 19.

In certain embodiments, this Example provides for diagnosing, prognosing and/or monitoring a pregnancy-associated disorder in a pregnant subject by detecting the DNA methylation profiles associated with the pregnancy-associated disorder. For example, and not by way of limitation, the method can include (a) obtaining a biological sample from the subject, (b) determining the methylation status of one or more genomic loci present in the biological sample, (c) comparing the methylation status of the one or more genomic loci to a reference and (d) diagnosing a pregnancy-associated disorder in the subject, wherein the difference in the methylation status of the one or more genomic loci in the biological sample compared to the reference indicates the presence of the pregnancy-associated disorder in the subject. In certain embodiments, the difference in the methylation status can also indicate the severity of the pregnancy-associated disorder.

In certain embodiments, a method for diagnosing, prognosing and/or monitoring a pregnancy-associated disorder in a pregnant subject includes (a) obtaining a biological sample from the subject, (b) determining the level of methylation of one or more genomic loci present in the biological sample, (c) comparing the level of methylation of the one or more genomic loci to a reference and (d) diagnosing a pregnancy-associated disorder in the subject, wherein the difference in the level of methylation of the one or more genomic loci in the biological sample compared to the reference indicates the presence of the pregnancy-associated disorder in the subject. In certain embodiments, the difference in the methylation level can also indicate the severity of the pregnancy-associated disorder.

In certain embodiments, this Example further provides methods for monitoring a subject at risk of developing preeclampsia. In certain embodiments, a subject a risk of developing preeclampsia is an individual that suffered from preeclampsia in an earlier pregnancy. For example, and not by way of limitation, the method can include determining the methylation status of one or more genomic loci in the biological sample obtained from the pregnant subject prior to a diagnosis of preeclampsia and determining the methylation status of the one or more genomic loci in a biological sample obtained from the subject at one or more later timepoints during the subject's pregnancy. In certain embodiments, a change in the methylation status of the one or more genomic loci in the second or subsequent samples, relative to the first sample can indicate that the subject has developed preeclampsia.

In certain embodiments, this Example further provides methods for determining if a pregnant subject has an increased risk of having a preterm birth. In certain embodiments, the method includes obtaining a biological sample from the subject, determining the methylation status and/or level of one or more genomic loci in the biological sample, comparing the methylation status and/or level of the one or more genomic loci to a reference and determining that the subject is at an increased risk of preterm birth. In certain embodiments, a difference in the methylation status and/or level of the one or more genomic loci in the biological sample compared to the reference indicates that the subject is at an increased risk of preterm birth.

In certain embodiments, diagnosis of a subject with a pregnancy-associated disorder, monitoring a subject at increased risk of developing a pregnancy-associated disorder or determining if a subject is at risk of having a preterm birth can be based on a higher or lower methylation level of the genomic locus in the biological sample of the subject relative to the methylation level in a reference sample, e.g., a biological sample from a non-pregnant woman or a pregnant woman that does not have the pregnancy-associated disorder. In certain embodiments, the reference sample can be a biological sample from the subject prior to being pregnant and/or during an earlier pregnancy where the subject did not have a pregnancy-associated disorder. In certain embodiments, a difference of greater than about 5%, greater than about 10%, greater than about 15%, greater than about 20%, greater than about 25%, greater than about 30%, greater than about 35%, greater than about 40%, greater than about 45%, greater than about 50%, greater than about 55%, greater than about 60%, greater than about 65%, greater than about 70%, greater than about 75%, greater than about 80%, greater than about 85%, greater than about 90% or greater than about 95% in the methylation (e.g., level, percentage and/or fraction) of the one or more genomic loci in a biological sample obtained from a subject compared to a control is indicative that the subject has the pregnancy-associated disorder or at risk of developing the pregnancy-associated disorder. In certain embodiments, the difference can be a decrease in methylation (e.g., level, percentage and/or fraction) of the genomic loci in the biological sample of the subject. Alternatively, the difference can be an increase in methylation (e.g., level, percentage and/or fraction) of the genomic loci in the biological sample of the subject. In certain embodiments, the difference can be a decrease in the methylation of a genomic locus and an increase in the methylation of a different genomic locus in the sample obtained from the subject. In certain embodiments, a decrease in the level of methylation of one or more genomic loci in the biological sample and the increase in the level of methylation of one or more different genomic loci in the biological sample indicates the presence of the pregnancy-associated disorder or preeclampsia in the subject.

In certain embodiments, diagnosis of a subject with a pregnancy-associated disorder, e.g., preeclampsia, can be based on the methylated or unmethylated state of a genomic locus, e.g., a CpG site. In certain embodiments, a genomic locus, e.g., a CpG site, in a sample from a subject diagnosed with a pregnancy-associated disorder can be methylated and the genomic locus, e.g., the CpG site, in a reference sample can be unmethylated. Alternatively, a genomic locus in a sample from a subject diagnosed with a pregnancy-associated disorder can be unmethylated and the genomic locus in a reference sample can be methylated.

In certain embodiments, this Example provides for diagnosing, prognosing, classifying and/or monitoring preeclampsia in a subject by assessing the DNA methylation profiles associated with preeclampsia. For example, and not by way of limitation, the method can include (a) obtaining a biological sample from the subject, (b) determining the methylation status and/or level of one or more genomic loci present in the biological sample, (c) comparing the methylation status and/or level of the one or more genomic loci to a reference and (d) diagnosing preeclampsia in the subject, wherein the difference in the methylation status and/or level of the one or more genomic loci in the biological sample compared to the reference indicates the presence of preeclampsia or a sub-type of preeclampsia in the subject. For example, but not by way of limitation, the method can be used to indicate whether the subject has mild preeclampsia or severe preeclampsia.

Diagnostic, Prognostic, Classification and Monitoring Methods Using an Algorithm of this Example

This Example further provides diagnostic and prognostic methods for diseases and/or disorders that are characterized by differential methylation of genomic loci by using an algorithm, as disclosed in Example Embodiment B. For example, but not by way of limitation, this Example provides methods for diagnosing, prognosing, classifying and/or monitoring a pregnancy-associated disorder in a subject that includes analyzing the methylation status of certain genomic loci and/or genomic fractions. In certain embodiments, the pregnancy-associated disorder is preeclampsia. In certain embodiments, the pregnancy-associated disorder is preterm birth.

By accurately representing the methylome of the maternal plasma as a mixture of multiple DNA methylomes of both fetal and maternal origin, the algorithm of this Example is able to provide a patient's specific reference methylation pattern for all genomic loci used as a biomarker, reduce the variance of the estimated deviation of the methylation pattern of a genomic locus in a test sample from the reference pattern, and achieve higher power when testing if the deviation is statistically significant.

Methods of Treatment of this Example

This Example further provides methods of treating a subject with a pregnancy-associated disorder, e.g., preeclampsia. In certain embodiments, a method of treating a pregnancy-associated disorder can include diagnosing a subject with the pregnancy-associated disorder as disclosed in herein, followed by the treatment of the subject. Any of the diagnosis methods disclosed herein can be used in treating a subject.

In certain embodiments, the treatment method can include (a) obtaining a biological sample from the subject, (b) determining the methylation status and/or level of one or more genomic loci present in the biological sample, (c) comparing the methylation status and/or level of the one or more genomic loci to a reference, (d) diagnosing a pregnancy-associated disorder in the subject, where the difference in the methylation status and/or level of the one or more genomic loci in the biological sample compared to the reference indicates the presence of the pregnancy-associated disorder in the subject and (e) treating the subject diagnosed with the pregnancy-associated disorder.

In certain embodiments, the methods of treatment can include diagnosing the subject with a pregnancy-associated disorder by use of the algorithm of this Example. For example, but not by way of limitation, the treatment method can include (a) diagnosing the subject with a pregnancy-associated disorder by use of the algorithm of this Example and (b) treating the subject diagnosed with the pregnancy-associated disorder.

In certain embodiments, the treatment method can include (a) obtaining a biological sample from the subject, e.g., a blood sample, (b) determining the fraction of fetal nucleic acid in the biological sample, (c) determining the methylation status of one or more genomic loci in placental nucleic acids present in the biological sample, (d) diagnosing the subject with the pregnancy-associated disorder by analyzing the fetal fraction and the methylation status of the genomic loci in the placental nucleic acid and (e) treating the subject diagnosed with the pregnancy-associated disorder.

In certain embodiments, this Example provides methods for treating preeclampsia. For example, but not by way of limitation, if a subject is diagnosed with preeclampsia, the subject can be treated by any method known in the art to treat preeclampsia. For example, but not by way of limitation, the subject can be treated by administration of medication to reduce the blood pressure of the subject, e.g., by administration of an anti-hypertensive medication. In certain embodiments, delivery can be used as a treatment for preeclampsia. Additional methods of treatment include administration of a corticosteroid administration of HMG-CoA reductase inhibitors and/or administration of an anti-convulsant medication.

In certain embodiments, the information provided by the methods described in this Example can be used by a physician in determining the most effective course of treatment (e.g., preventative or therapeutic) for the subject. A course of treatment refers to the measures taken for a patient after the prognosis or the assessment of increased risk for development of a pregnancy-associated disorder is made. For example, when a subject is identified to have an increased risk of developing a pregnancy-associated disorder, e.g., preeclampsia, the physician can determine whether frequent monitoring of DNA methylation changes can be performed as a prophylactic measure. Also, when a subject is diagnosed with pregnancy-associated disorder, e.g., preeclampsia (e.g., based on the presence of a DNA methylation pattern in a sample from a subject), it can be advantageous to follow such detection with a therapeutic treatment.

In certain embodiments, this Example further provides methods for assessing the efficacy of a therapeutic or prophylactic therapy for preventing, inhibiting or treating a pregnancy-associated disorder in a subject, comprising determining the methylation status of one or more genomic loci obtained from a subject prior to therapy and determining methylation status of the one or more genomic loci present in a biological sample obtained from the subject at one or more time points during therapeutic or prophylactic therapy, wherein the therapy is efficacious for preventing, inhibiting and/or treating a pregnancy-associated disorder in a subject when there is a change in the presence and/or level of methylation of the one or more genomic loci in the second or subsequent samples, relative to the first sample. In certain embodiments, the first sample is obtained after therapeutic treatment has begun.

In certain embodiments, the methods for monitoring the response in a subject to prophylactic or therapeutic treatment (for example, administration of an anti-hypertensive to a subject diagnosed with preeclampsia) can include measuring the methylation status and/or level of one or more genomic loci in a biological sample of a subject at a first timepoint, administering a therapeutic agent, re-measuring the methylation status and/or level of the one or more genomic loci at a second timepoint, comparing the results of the first and second measurements and optionally modifying the treatment regimen based on the comparison. In certain embodiments, the first timepoint is prior to an administration of the therapeutic agent, and the second timepoint is after said administration of the therapeutic agent. In certain embodiments, the first timepoint is prior to the administration of the therapeutic agent to the subject for the first time. In certain embodiments, the dose (defined as the quantity of therapeutic agent administered at any one administration) is increased or decreased in response to the comparison. In certain embodiments, the dosing interval (defined as the time between successive administrations) can be increased or decreased in response to the comparison, including total discontinuation of treatment. In addition, the method of the present disclosure can be used to determine the efficacy of the therapeutic treatment, wherein a change in the methylation status of certain genomic loci present in a biological sample of a subject can indicate that the therapeutic treatment regimen can be altered, reduced and/or stopped.

Assays of this Example

This Example further provides assays and/or methods for determining the DNA methylation status and/or level of genomic loci that correlates with the presence, absence and/or severity of a pregnancy-associated disorder. For example, but not by way of limitation, the pregnancy-associated disorder is preeclampsia. In certain embodiments, a method can include comparing the methylation status and/or level of genomic loci present in a biological sample from a subject that has a pregnancy-associated disorder to the methylation status and/or level of genomic loci in a biological sample from a healthy subject to determine the methylation pattern, as disclosed above, that correlates with the presence of the pregnancy-associated disorder. In certain embodiments, a method can include comparing the methylation status and/or level of genomic loci in a biological sample from a subject that has a pregnancy-associated disorder at an early stage (or less severe case, e.g., mild preeclampsia) to the methylation status and/or level of genomic loci in a biological sample from a subject that has the pregnancy-associated disorder at a late stage (or a more severe case, e.g., severe preeclampsia), as disclosed above, to determine the methylation status and/or level that correlates with the different stages and/or severity of the pregnancy-associated disorder. Non-limiting examples of pregnancy-associated disorders include preeclampsia, preterm labor, preterm birth, hyperemesis gravidarum, ectopic pregnancy, intrauterine growth retardation and genomic abnormalities (e.g., aneuploidy and chromosomal abnormalities such as trisomy 21, trisomy, 18, trisomy 13 and extra or missing copies of the X chromosome and Y chromosome).

DNA Isolation Techniques of this Example

In certain embodiments, the methods of this Example include obtaining nucleic acid from a biological sample from a subject, e.g., a blood sample. There are several platforms that are known in the art and currently available to isolate nucleic acids from biological samples. For example, but not by way of limitation, isolation of DNA from a biological sample can be performed by extraction methods using organic solvents such as a mixture of phenol and chloroform, followed by precipitation with ethanol (see, for example, J. Sambrook et al., “Molecular Cloning: A Laboratory Manual,” 1989, 2nd Ed., Cold Spring Harbor Laboratory Press: New York, N.Y.). Additional non-limiting examples include salting out DNA extraction (see, for example, P. Sunnucks et al., Genetics, 1996, 144:747-756; and S. M. Aljanabi and I. Martinez, Nucl. Acids Res. 1997, 25:4692-4693), the trimethylammonium bromide salts DNA extraction method (see, for example, S. Gustincich et al., BioTechniques, 1991, 11:298-302) and the guanidinium thiocyanate DNA extraction method (see, for example, J. B. W. Hammond et al., Biochemistry, 1996, 240:298-300). There are also numerous commercially available kits that can be used to extract DNA from biological fluids or cells, for example, Qiagen's Gentra PureGene Cell Kit, QIAamp Circulating Nucleic Acid Kit, QiaAmp DNA Mini Kit, DNeasy Blood and Tissue Kit or QiaAmp DNA Blood Mini Kit (Qiagen, Hilden, Germany), GenomicPrep™ Blood DNA Isolation Kit (Promega, Madison, Wis.) and GFX™ Genomic Blood DNA Purification Kit (Amersham, Piscataway, N.J.) can be used to obtain DNA from a biological sample, e.g., blood sample or a tissue sample, from a pregnant woman.

In certain embodiments, the biological sample can be enriched or relatively enriched for maternal nucleic acids (e.g., maternal DNA) by one or more methods. For example, but not by way of limitation, maternal peripheral blood can be collected from a pregnant subject by venipuncture, e.g., during the first trimester of pregnancy, and DNA from leukocytes obtained from the blood sample, i.e., maternal DNA, can be isolated in the disclosed methods. In certain embodiments, placental nucleic acids can be isolated from a biological sample and used in the methods of this Example.

In certain embodiments, the biological sample can be first enriched or relatively enriched for fetal nucleic acids (e.g., fetal DNA) by one or more methods. For example, but not by way of limitation, discrimination between fetal and maternal DNA can be performed by detecting one or more of the following: single nucleotide differences between chromosome X and Y, chromosome Y-specific sequences, polymorphisms located elsewhere in the genome, size differences between fetal and maternal DNA and differences in the methylation pattern between maternal and fetal tissues. For example, but not by way of limitation, fetal nucleic acids can be enriched from a biological sample based on the differential methylation between fetal and maternal nucleic acids. In certain embodiments, separation of fetal and maternal nucleic acids can be based on the methylation status of a genomic locus, e.g., a CpG site or a genomic locus that includes a CpG site. In certain embodiments, the genomic locus of the maternal DNA is methylated and the genomic locus of the fetal DNA is unmethylated. In certain embodiments, the genomic locus of the maternal DNA is unmethylated and the genomic locus of the fetal DNA is methylated. In certain embodiments, the genomic locus of the maternal DNA is hypomethylated compared to the genomic locus of the fetal DNA. In certain embodiments, the genomic locus of the maternal DNA is less methylated compared to the genomic locus of the fetal DNA. In certain embodiments, the genomic locus of the maternal DNA is hypermethylated compared to the genomic locus of the fetal DNA. In certain embodiments, the genomic locus of the maternal DNA is more methylated compared to the genomic locus of the fetal DNA. See, e.g., U.S. 2010/0105049, the contents of which are incorporated by reference in their entirety.

Methylation Detection Techniques of this Example

Various methylation analysis procedures are known in the art, and can be used in the methods of this Example. These assays allow for determination of the methylation state of one genomic locus, e.g., one or more CpG sites or islands within a nucleic acid obtained from a biological sample. In addition, the methods can be used to quantify the methylation of a genomic locus. Such assays involve, among other techniques, DNA sequencing of bisulfite-treated DNA, PCR (for sequence-specific amplification), digital PCR and use of methylation-sensitive restriction enzymes.

In certain embodiments, methylation-specific PCR can be used to determine the methylation status of a genomic loci. Methylation-specific PCR is based on a chemical reaction of sodium bisulfite with DNA that converts unmethylated cytosines, e.g., of CpG dinucleotides, to uracil or UpG, followed by traditional PCR. Methylated cytosines will not be converted in this process and primers can be designed to overlap the methylation site, e.g., CpG site, of interest, thereby allowing one to determine the methylation status of the methylation site as methylated or unmethylated. Additionally, restriction enzyme digestion of PCR products amplified from bisulfite-converted DNA may be used, e.g., by using the method described by Sadri & Hornsby (Nucl. Acids Res. 1996; 24:5058-5059) or COBRA (Combined Bisulfite Restriction Analysis) (Xiong & Laird, Nucleic Acids Res. 1997; 25:2532-2534).

In certain embodiments, whole genome bisulfite sequencing, which is a high-throughput genome-wide analysis of DNA methylation, can be used to determine the methylation status of multiple genomic loci. It is based on sodium bisulfite conversion of genomic DNA, as described above, which is then sequenced on a next-generation sequencing platform. The sequences obtained are then re-aligned to the reference genome to determine the methylation states of cytosines, e.g., of CpG dinucleotides, present within the analyzed genomic loci based on mismatches resulting from the conversion of unmethylated cytosines into uracil.

In certain embodiments, genome-wide DNA methylation profiling can be performed using commercially-available arrays, thereby allowing the interrogation of multiple genomic loci, e.g., multiple CpG sites. Non-limiting examples of such arrays include HumanMethylation BeadChips (Illumina, San Diego, Calif.) and Infinium MethylationEPIC kit (Illumina). Additional methods for analyzing the methylation state of multiple genomic loci is provided in Yong et al., Epigenetics & Chromatin 2016; 9:26, which is incorporated by reference herein.

Kits of this Example

This Example provides kits for diagnosing, monitoring, classifying and/or treating a subject with a pregnancy-associated disorder that comprise a means for determining and/or detecting the methylation status of one or more genomic loci. For example, but not by way of limitation, a kit of this Example can be used to diagnose, monitor and/or treat a subject with preeclampsia. In certain embodiments, a kit of this Example can be used to diagnose, monitor and/or treat a subject with preterm labor and/or preterm birth.

Types of kits include, but are not limited to, packaged probe and primer sets (e.g., TaqMan probe/primer sets), arrays/microarrays, which further contain one or more probes, primers or other detection reagents for determining the methylation state and/or level of one or more genomic loci. For example, but not by way of limitation, a kit of this Example can include one or more probes or primers for detecting the methylation state of one or more genomic loci. In certain embodiments, the one or more genomic loci comprise a CpG site. In certain embodiments, one or more of the genomic loci do not comprise a CpG site. For example, but not by way of limitation, about 5% or more, about 10% or more, about 15% or more, about 20% or more, about 25% or more, about 30% or more, about 35% or more, about 40% or more, about 45% or more, about 50% or more, about 55% or more, about 60% or more, about 65% or more or about 70% or more of the one or more genomic loci detected by the primers or probes of this Example comprise one or more CpG sites.

In certain non-limiting embodiments, a primer and/or probe of this Example can be at least about 10 nucleotides or at least about 15 nucleotides or at least about 20 nucleotides in length and/or up to about 200 nucleotides or up to about 150 nucleotides or up to about 100 nucleotides or up to about 75 nucleotides or up to about 50 nucleotides in length.

In a further non-limiting embodiment, the oligonucleotide primers and/or probes can be immobilized on a solid surface or support, for example, on a nucleic acid microarray, wherein the position of each oligonucleotide primer and/or probe bound to the solid surface or support is known and identifiable.

In certain non-limiting embodiments, a kit of this Example can additionally include other components such as, but not limited to, a buffer, enzymes such as DNA polymerases or ligases, nucleotides such as deoxynucleotide triphosphates, positive control sequences, negative control sequences and the like necessary to carry out an assay or reaction to detect the methylation state of a genomic locus.

In certain embodiments, this Example provides for a kit that includes a container comprising one or more probes and/or primers for detecting the methylation state of one or more genomic loci. The kit can further include instructions for use, e.g., the instructions can describe that a particular methylation status of a genomic locus is indicative of a pregnancy-associated disorder in a subject. The instructions can be printed directly on the container (when present), or as a label applied to the container, or as a separate sheet, pamphlet, card or folder supplied in or with the container.

Reports, Programmed Computers and Systems of this Example

In certain embodiments, the diagnosis and/or monitoring of a pregnancy-associated disorder, e.g., preeclampsia, in a subject based on the methylation status of one or more genomic loci, can be referred to herein as a “report.” A tangible report can optionally be generated as part of a testing process (which can be interchangeably referred to herein as “reporting,” or as “providing” a report, “producing” a report or “generating” a report).

Examples of tangible reports can include, but are not limited to, reports in paper (such as computer-generated printouts of test results) or equivalent formats and reports stored on computer readable medium (such as a CD, USB flash drive or other removable storage device, computer hard drive, or computer network server, etc.). Reports, particularly those stored on computer readable medium, can be part of a database, which can optionally be accessible via the internet (such as a database of patient records or genetic information stored on a computer network server, which can be a “secure database” that has security features that limit access to the report, such as to allow only the patient and the patient's medical practitioners to view the report while preventing other unauthorized individuals from viewing the report, for example). In addition to, or as an alternative to, generating a tangible report, reports can also be displayed on a computer screen (or the display of another electronic device or instrument).

A report can include, for example, an individual's medical history, or can just include size, presence, absence or levels of one or more markers (for example, a report on computer readable medium such as a network server can include hyperlink(s) to one or more journal publications or websites that describe the medical/biological implications). Thus, for example, the report can include information of medical/biological significance as well as optionally also including information regarding the methylation status of relevant genomic loci, or the report can just include information regarding the methylation status of relevant genomic loci without other medical/biological significance.

A report can further be “transmitted” or “communicated” (these terms can be used herein interchangeably), such as to the individual who was tested, a medical practitioner (e.g., a doctor, nurse, clinical laboratory practitioner, genetic counselor, etc.), a healthcare organization, a clinical laboratory, and/or any other party or requester intended to view or possess the report. The act of “transmitting” or “communicating” a report can be by any means known in the art, based on the format of the report. Furthermore, “transmitting” or “communicating” a report can include delivering a report (“pushing”) and/or retrieving (“pulling”) a report. For example, reports can be transmitted/communicated by various means, including being physically transferred between parties (such as for reports in paper format) such as by being physically delivered from one party to another, or by being transmitted electronically or in signal form (e.g., via e-mail or over the internet, by facsimile, and/or by any wired or wireless communication methods known in the art) such as by being retrieved from a database stored on a computer network server, etc.

In certain embodiments, this Example provides computers (or other apparatus/devices such as biomedical devices or laboratory instrumentation) programmed to carry out the methods of this Example, e.g., to perform the algorithm of this Example (see Example Embodiment B). In certain embodiments, the system can be controlled by the individual and/or their medical practitioner in that the individual and/or their medical practitioner requests the test, receives the test results back and (optionally) acts on the test results to reduce the individual's pregnancy-associated disorder risk or treat the individual, such as by implementing a disorder management system.

The following Example Embodiments are offered to more fully illustrate the disclosure of the Example, but are not to be construed as limiting the scope thereof

Example Embodiment a of Example 1: DNA is Differentially Methylated in Healthy Pregnant Females and Pregnant Females with Preeclampsia

DNA methylation patterns are associated with cellular phenotypes, are altered in complex gestational disease states and are cell lineage-specific. Thus, DNA methylation signatures identified in plasma potentially contain information relating to both pathobiology and the cell lineage-specific origins of the signal.

The maternal plasma compartment contains biomarkers derived from both mother and fetus and offers several distinct theoretical advantages. Obtaining maternal plasma is less invasive than obtaining amniotic fluid, placental biopsy or fetal blood. Maternal blood is drawn routinely at several points in time during prenatal care and, thus, a plasma biomarker could be incorporated into the usual provision of care. Using maternal plasma for biomarkers of PTB also facilitates central processing and analysis. After minimal processing at the point of care, blood specimens may be transported to a central facility for processing and analysis.

The present Example Embodiment A used solution phase hybridization to undertake targeted region capture of bisulfite-converted DNA obtained from the plasma of pregnant women in early gestation and non-pregnant female controls as disclosed in Example Embodiment C. The present Example Embodiment A performed targeted sequencing of 80.4 Mb of the plasma methylome and generated an average genome read depth of ˜42× in 18 plasma samples. The present Example Embodiment A used these data to identify the pregnancy-specific characteristics of cell-free DNA (cfDNA) methylation in plasma and found that pregnancy resulted in clearly detectable global alterations in DNA methylation patterns that were modulated by genomic location. Similar data was analyzed from first trimester maternal leukocyte populations and gestational age-matched chorionic villus (CV) and confirmed that tissue-specific DNA methylation signatures in these samples had a significant influence on global and gene-specific methylation in the plasma of pregnant versus non-pregnant women. The subject matter disclosed in this Example can be used in the context of non-invasive prenatal testing with respect to phenotypic pregnancy monitoring and the early detection of complex gestational phenotypes such as preeclampsia and preterm birth.

Results

DNA was extracted from the plasma of both pregnant and non-pregnant women. Blood was drawn from the pregnant women between 10-13 weeks gestation. Plasma DNA was subjected to solution-phase hybridization capture and DNA sequencing. Summary sequencing data are shown in Table 1. It was previously found that CpG methylation levels in early gestational maternal leukocytes were distributed in a biphasic manner with two peaks reflecting largely low methylation (LM) (<20%) and high methylation (HM) (>80%) states respectively. Relatively few CpG sites existed in an intermediate methylation (IM) (20-80%) state in these samples (Chu et al., PLoS One. 2011; 6:e14723). Previous studies have also determined that, compared with maternal leukocytes, the early gestational chorionic villus (CV) contained relatively fewer HM CpG sites, more IM sites and a slight increase in LM sites (Chu et al., PLoS One. 2011; 6:e14723).

TABLE 1 Mapping Mean Sample ID Read Pairs efficiency Target Coverage EN1 65,822,964 84.53% 25.6 EN2 82,220,206 84.84% 31.7 EN3 124,211,203 84.56% 11.0 EN4 82,129,212 87.74% 24.4 EN5 57,426,035 84.90% 16.6 EN6 60,396,896 85.30% 41.9 EN7 73,976,467 84.51% 40.2 EN8 76,707,685 84.44% 36.3 EN9 65,869,746 84.73% 37.3 EN11 114,262,664 84.85% 33.2 EN12 152,428,644 84.82% 98.9 PL12 100,790,172 83.47% 40.1 PL230 94,809,078 84.76% 71.3 PL891 78,878,511 83.43% 46.2 PL897 117,999,412 83.71% 61.3 PL1523 123,747,149 84.65% 59.5 PL1746 88,845,825 84.80% 39.7

Differences in the methylation of DNA obtained from pregnant women were compared to non-pregnant women were analyzed. As shown in FIGS. 9A-9G, which provide the distribution of DNA methylation in plasma samples from pregnant and non-pregnant women, differences were observed between the methylation of various sites within DNA isolated from plasma samples obtained from pregnant women as compared to non-pregnant women. As shown in FIG. 9A, the CpG methylation distributions of circulating cell-free plasma DNA from non-pregnant women largely reflected those of maternal leukocytes. Although this distribution was preserved in the plasma of pregnant women, there was a notable reduction in HM sites and an increase in IM sites. Given that the early gestational placenta displayed a significant reduction in HM CpG sites (Chu et al. PLoS One. 2011; 6:e14723), the distribution of CpG methylation in the plasma of pregnant, compared to non-pregnant women reflected the presence in maternal plasma of significant numbers of placentally-derived circulating cfDNA fragments. Thus, patterns of CpG methylation in the CV genome were broadly visible via the window of maternal plasma.

Methylation distributions were influenced by the location of the CpG sites of interest. Those sites that are present in specific genomic elements, specifically: exons, introns, promoters, CpG island (CGIs), CpG island shores and enhancers were examined. To reduce potential bias created by the presence within other elements of CGIs, these regions of interest were filtered to exclude those that overlapped with CGIs (with the exception of course of CGIs themselves). As shown in FIGS. 9A-9G, it was found that methylation distributions were influenced by CpG site location. Those located in promoters consisted of relatively fewer numbers of HM sites (FIG. 9B) when compared to all sites (FIG. 9A). CpGs in introns and exons were skewed towards HM sites (FIGS. 9C and 9D). Relative distributions in CGIs were similar to those in promoters (FIG. 9E). CpGs in CGI shores and enhancers contained relatively more IM sites (FIGS. 9F and 9G) and CGI shores contained more HM sites than LM sites. Pregnancy resulted in a clear relative reduction in HM sites and an increase in IM sites in every case (FIGS. 9B-9G), reflecting the presence of maternal plasma of a significant number of relatively hypomethylated CV genome equivalents. This established that pregnancy has a significant and global impact on the methylation profile of circulating cfDNA and that this is likely dependent to a large degree on the influence of chorionic villus-derived cfDNA fragments.

The spatial differences between CpG methylation in cell-free plasma DNA from pregnant and non-pregnant women were examined. CpG methylation levels were plotted spatially both in a genome-wide fashion and with respect to each autosome. FIG. 10 provides a heatmap of plasma DNA methylation as compared between pregnant (PL) and non-pregnant women (EN). FIG. 10 shows that methylation levels were broadly reduced in DNA extracted from maternal plasma relative to non-pregnant female plasma. This relationship was consistent with respect to every autosome (FIG. 11), further demonstrating the impact of pregnancy as a significant contributor to the CpG methylation signature that was detectable in circulating cell free plasma DNA in the first trimester of pregnancy. As shown in FIG. 11, which provides chromosome-specific plots of spatial DNA methylation distributions, differences in DNA methylation in plasma samples obtained between weeks 10-13 were observed between pregnant and non-pregnant (control) women.

The impact of genomic location on the ability of the presently disclosed subject matter to detect spatial differences in DNA methylation signals in plasma from pregnant and non-pregnant women was examined. The methylation rates of CpG sites present in each of the same structural and regulatory genomic elements (see above) between plasma DNA from pregnant and non-pregnant women were compared. As shown in FIGS. 12A-12F, differential methylation of DNA obtained from plasma samples from pregnant and non-pregnant women differs by genomic element. For example, greater differences in methylation of exons and introns between pregnant and non-pregnant women were observed. As before, regions of interest were filtered to exclude those that overlapped with CGIs. CpGs within exons (FIG. 12B) and introns (FIG. 12C) displayed the most significant pregnancy-specific differences in methylation, followed by those in promoters (FIG. 12A). Sites within CGI shores (FIG. 12D) and enhancers (FIG. 12E) showed fewer differences in global methylation level between pregnant and non-pregnant plasma, and CGIs (FIG. 12F) displayed almost no difference at this resolution. This shows that the impact of the first trimester placental methylome on CpG methylation levels in maternal plasma is influenced significantly by genomic location.

To identify specific CpG sites whose methylation levels are altered between pregnant and non-pregnant plasma, a multiple testing significance filter of q=<0.1 and a % methylation filter of >+/−5% was used. These analyses confirmed the overall reduction in CpG methylation levels observed in maternal plasma compared to plasma from non-pregnant women. Specifically, 24,398 CpG sites were identified, whose methylation levels were significantly different between pregnant and non-pregnant subjects. A significant majority of these (19,470 sites or −80%) displayed lower cfDNA methylation levels in plasma from pregnant women than non-pregnant women. There were 10.35-fold more CpG sites in exons (2463/2702 or ˜91%) that were less methylated in the plasma of pregnant compared to non-pregnant women. Similar ratios were identified for introns, with 8.82-fold more CpG sites in introns (6032/6803 or ˜89%) that were less methylated in the plasma of pregnant compared to non-pregnant women. There were 4.53-times more CpG sites in promoters that were less methylated in pregnant v non-pregnant plasma (2078/2536 or ˜82%) and similar numbers for CGI (4.32-fold, 2987/3671, ˜81%). Similarly of 78 CpGs in enhancers (excluding those overlapping with CGIs) 63 (˜81%) (4.2-fold) were less methylated in pregnant v non-pregnant plasma. Notably, there were only 2.14-fold more CpGs in CGIs (510/1092 or ˜55%) that were less methylated in cfDNA from the plasma of pregnant v non-pregnant women. These results are reflected in the data presented in FIG. 11, and provided considerably more insight into the degree of the impact of pregnancy on cfDNA methylation profiles in plasma.

It was next determined whether the specific loci that are known to have distinct epigenetic signatures in the CV genome may influence maternal plasma DNA methylation profiles in a predictable fashion. To identify such loci, CpG methylation between early gestational CV tissue samples (11-13 weeks) and gestational age-matched maternal leukocytes (MBC) were compared. cfDNA from CV and MBC contributed the majority of circulating DNA fragments to maternal plasma during pregnancy. These assays were performed using solution-phase hybridization followed by bisulfite DNA sequencing. A multiple testing significance filter of q=<0.1 was used for these analyses. The CpG sites that were differentially methylated between the CV and MBC genomes AND that were differentially methylated between plasma samples from the pregnant and non-pregnant women were identified. It was found, in every instance, that if a CpG site was differentially methylated between CV and MBC and between plasma from pregnant and non-pregnant women, the direction of change in each comparison was the same. That is, if a CpG site was less methylated in CV compared to MBC then it was also less methylated in the plasma of pregnant women compared to non-pregnant women. A total of 6,558 CpG sites followed this trend. Examples of such genomic loci were shown in FIGS. 18A-18L. Similarly, if a CpG site was more methylated in CV compared to MBC then it was also more methylated in the plasma of pregnant women compared to non-pregnant women. A total of 1,707 CpG sites followed this trend. Examples of this were shown in FIGS. 18M-18X. No instances were found where these directional relationships did not hold true. These results demonstrate the strong influence of circulating cfDNA fragments originating in the chorionic villus on the molecular phenotype of the maternal plasma DNA methylome in early gestation.

Differences between the distribution of DNA methylation in chorionic villus sampling (CVS) samples and maternal leukocyte samples obtained from pregnant women between 10-13 weeks gestation were also observed. As shown in FIG. 13, differences were observed between the methylation of various sites within DNA isolated from maternal leukocyte samples and CVS samples. FIG. 13A shows the methylation of all sites analyzed, FIG. 13B shows the methylation of sites present within promoters, FIG. 13C shows the methylation of sites present within introns of genes, FIG. 13D shows the methylation of sites present within exons of genes, FIG. 13E shows the methylation of sites present within CpG island shores, FIG. 13F shows the methylation of sites present within CpG islands and FIG. 13G shows the methylation of sites present within enhancers. In particular, significant differences in the methylation status of CpG islands, CpG island shores and enhancers were observed. Differences in the methylation of genomic elements between maternal leukocyte samples and CVS samples were also analyzed. As shown in FIG. 14, differential methylation of DNA isolated from maternal leukocyte samples and CVS samples differed based on the genomic element. Greater differences in methylation were observed in sites present within exons, intron and enhancers.

To determine if genomic regions are differentially methylated in preeclampsia patients, maternal plasma samples were collected from women that later developed severe preeclampsia or mild preeclampsia between 10-13 weeks of gestation. As shown in FIG. 15, which provides a heatmap of the differentially methylated CpG sites within specific genes, significant differences in CpG methylation were observed between pregnant women that would develop severe preeclampsia (SPE; FIG. 15A) or mild preeclampsia (MPE; FIG. 15B) and pregnant women that do not develop preeclampsia (Ctr). FIG. 16 shows the DNA methylation patterns of chromosome 1 in plasma samples obtained at 10-13 weeks of gestation from women who went on to develop severe preeclampsia or mild preeclampsia compared to normal controls. Specific regions within chromosome 1 were differentially methylated in plasma samples from pregnant women that would develop severe preeclampsia or mild preeclampsia compared to pregnant women that do not develop preeclampsia (FIG. 16). Tables 2-4 provide the genomic loci that are differentially methylated in preeclampsia patients. Table 2 provides the genomic loci that are differentially methylated in patients that have severe preeclampsia as compared to normal patients. Table 3 provides the genomic loci that are differentially methylated in normal patients as compared to patients that have mild preeclampsia. Table 4 provides the genomic loci that are differentially methylated in patients that have severe preeclampsia as compared to patients that have mild preeclampsia.

TABLE 2 Severe Preeclampsia versus Normal chr start end strand pvalue qvalue meth.diff chr14 3630780 34630780 * 1.23757164166253e−20 1.042273646659133e−14 −69.4444444444444 chr14 100562866 1005626866 * 1.93875488245721e−10 0.343459581332533e−05 −66.677822400714 chr10 80316236 803162236 * 1.36257656350231e−07

−65.2777777777778 chr9 6639395 6639395 * 5.13477476101507e−06 0.0740092321987059 −64.0449438202247 chr3 184083996 184063996 * 5.69057647450106e−06

−61.8041765543427 chr17 67836273 67835273 * 2.06550818522239e−09 0.000221397954240477 −61.4474408365438 chr12 52433824 52433824 *  1.3901734845378e−09 0.000167256226154071 −58.2053501180173 chr14 59713230 59713230 * 4.75620035556604e−

0.070625341303741 −59.0909090909091 chr4 8366705 8366705 * 2.46392999035932e−06 0.0457065024985565 −58.4877622377622 chr3

* 1.73676157460199e−06 0.0376426916901624 −57.3343824713252 chr20 38698763 38698763 * 1.53214328471838e−16 6.94608947471698e−11 −57.2191166599007 chr14 100562865 100562865 *  3.5318390553361e−

0.0583232963007572 −56.6965517241379 chr4 8366877 8366877 * 2.50030133599156e−07 0.010075342199196 −56.6508092282977 chr4 8366676 8366676 * 2.01821805612074e−07 0.00649663983877575 −56.3573625694972 chr7 2123618 2123618 *

4.63379604746384e−05 −56.3555347091932 chr5 181193978 181193978 * 4.82857178702083e−10  6.7764433307093e−05 −56.1562323592302 chr20

*

7.15442052471664e−08

chr11 125932408 125932408 * 3.26700517199238e−09

−53.6016949152542 chr1 205849727 205849727 * 5.19656282371184e−07 0.01770684052329949 −53.5017730496454 chr1 147098664 147096684 *

.9151506676131e−11 7.69372391425696e−05 −53.1922043010753 chr7 4833972 4833972 * 3.50606198251754e−08 0.0580605895994088 −52.6116939883646 chr5 181193979 181193979 * 3.20366650351068e−15 1.1195386542593e−09 −51.1572498298162 chr10 94390089 94390089 *

0.0616405238802056 −50.9295499021527 chrX 142972654 142972654 * 2.77792054960497e−16 1.16977195343523e−10

chrX 5828406 5828406 * 1.63509075637424e−11 3.57015829285575e−06 −48.1912144702842 chr15 50944108 50944108 * 1.03069739500366e−06

−48.1324278438031

180432021 180432021 * 4.42630224445324e−11 8.41760592191312e−06 −46.7545787545788 chr2 121226155 121226155 * 9.18590257807462e−07 0.0250713348175674 −46.5711503686187

14803557 14803557 * 4.10811700269213e−06 0.0649018977494697 −46.5405911928851 chr5 180432022 180432022 * 1.45171617533349e−07 0.008846696479283

chr12 128705935 128705935 *

4.22659018652328e−23 −44.7227983579116 chr17 78429801 78429801 * 7.25226999556526e−07 0.0213773240807728 −44.0374003795066 chr6 28444550 28444550 * 4.74288643278353e−09 0.000417327780639565 −43.6251920122888 chr2 240900828 240900828 *  1.1644271575075e−74

−43.3805682467162 chr5 175555631 175555631 * 3.38562904250277e−06 0.0584880066848964 −43.1745031746032 chr1 205849838 205849838 * 6.19096061641065e−07 0.0195227571945974

chrX 142972653 142972653 * 1.22864900146637e−07 0.00622598807228257 −42.7534418022528 chr5 181193879 181193879 *  3.1232729731788e−13 8.36944452391397e−08 −42.6925926926927 chr22 50300221 50300221 * 8.27605998982079e−

0.0934679031257298 −42.0496894409938 chr7 904231 904231 * 1.09939542198965e−

0.0279367133743123 −41.3101983949422 chr22 49239411 49239411 * 2.06341099607852e−07 0.00656656607429472 −39.9092970521542 chr13 114087703 114087703 * 2.50115545257257e−

0.0472767137975261 −39.8302006550456 chr1 121006621 121006621 * 4.47885349679804e−06 0.0662284122113361 −39.6768826700404 chr2 105599292 105599292 * 2.48712460856619e−

0.048995072435054 −39.3647181381715 chr2 240900827 240900827 * 4.24010299264063e−

1.24884392286775e−29 −38.461422543701 chr10 80273713 80273713 * 1.23242597855127e−30

−38.1627828435339 chr1 205849826 205849826 * 9.22990309950144e−06 0.0990711541344773 −38.0831212692282 chr2 105599293 105599293 * 1.31975309310901e−08 0.00105141370157715

chr14 92667121 92667121 * 2.22643778184888e−06 0.0437520776596981 −37.1031746031746 chr8 123851131 123851131 * 3.94219081356403e−

0.0629525059881015 −36.7587232045063 chr7 143443324 143443324 * 6.10229062628313e−06 0.0802319051571692 −36.3359707851491 chr11 8031559 8031559 *  2.7366575657266e−06 0.495141213330146 −36.2747784781219 chr2 232520493 232520493 * 1.18172568538255e−

0.000954340133444324 −36.1954887218048 chr7 904312 904312 * 2.05612595771965e−06 0.0414358634417034 −35.4140829543889 chr17 6997044 6997044 * 1.36283520517278e−14 4.22862444121564e−09 −35.1351351351351 chr10 129048340 129048340 *

0.000282761486039124 −35.0382845915636 chr10 129046339 129046339 * 1.73859926079335e−07 0.007065007040002 −34.95337995338 chr1 169158038 169158038 * 1.56325481535507e−09 0.00017950025742979 −34.7366421052632 chr12 127705934 127705934 * 2.68039058344032e−07 0.0104647697913332 −34.5158002038736 chr4 26083923 26083923 * 4.33199808088278e−06

−34.3992773261066 chr10 131397640 131397640 * 7.37261632374229e−06 0.0581625385929518 −34.3293117171463 chr18 42795335 42795335 * 1.13715064226165e−

0.000931096892695178 −34.2955473660022 chr4 92697208 92697208 * 3.81808787652981e−07 0.0134566624568968 −34.2697594501718 chr1 205650256 205650256 * 2.02202569728084e−

0.0413907747513456 −34.1249226953513 chr20 3650911 3650911 * 4.87572798743508e−09 0.000422707475934433 −34.1176470588235 chr17 6997043 6997043 *  1.1019113537954e−23 1.0826916323106e−17 −33.6709677419355 chr4 99072347 99072347 * 8.06514713089625e−07 0.0231778715158391 −33.3548458885056 chr12 64828874 64828874 * 4.96051999439397e−06

−33.3397832517337 chr14 51508130 51508130 * 8.55511095832576e−

0.00460336644979361 −33.2306111967129 chr14 92687198 92687198 * 5.27577991954763e−07 0.0177728876122088 −33.1851179673321 chr16 33932399 33932399 * 1.40436685477237e−07 0.00673106986650923 −33.1529561529582 chr14 92687205 92687205 * 1.11121583813308e−06 0.0334048397934732 −32.7631578947368 chr1 4161376 4161376 * 1.40538280258833e−06 0.0330088417892059 −32.6213090392195 chr1 24595268 24595268 * 5.67954148117829e−09 0.000485258954093333 −32.5809940286068 chr21 45304888 45304888 * 3.62045084752562e−07 0.0134568624568968 −32.3863836363636 chr1 109712040 109712040 * 3.01185893127661e−06 0.0525324068488898 −31.9859885915705 chr10 44910843 44910843 *  7.7263708029507e−07 0.0225493252148527 −31.5325056088044 chr1 205550472 205550472 * 4.72952773994055e−07 0.0153053843117403 −30.8545027599036 chr1 21428121 21428121 * 5.89429147947907e−06 0.0759587300954092 −30.8253968253968 chr16 78046399 78046399 * 2.06183471870158e−06 0.0414358864417034 −30.7692307692308 chr16 87963156 87963156 *   6.542277206852E−06 0.0629440737732974 −30.5769230769231 chr15 25177929 25177929 * 5.20175126558262e−

0.0032280135492946 −30.3661616151616 chr17 81159480 81159480 * 1.17788440569205e−07 0.0060382933021247 −30.1676481192184 chr22 49239412 49239412 * 2.56600853755404e−07 0.0101289543974943 −29.2403100775194 chr2 130808958 130808958 * 1.01978919977991e−09 0.000127912623247126 −29.2139195277303 chrX 46574452 46574452 *

0.022728747705268 −28.9164077623111 chr14 68695029 68695029 * 8.97786573456088e−

0.0959577598056528 −28.6007438758497 chr21 45304887 45304887 * 1.70459115138157e−06 0.0370817576772098 −28.4476451724158 chr2 130193974 130193974 * 3.38444555514297e−08 0.0022176897988434

chr4 169154581 169154581 *  4.9899650680501e−08 0.00312952931851425 −27.8231621488588 chr10 133272250 133272250 * 1.44490804768508e−06 0.0334048397934732 −27.7777777777776 chr15 52204700 52204700 *

0.0439055437354353 −27.702056432329 chr2 32203968 32203968 * 4.25470681648478e−06 0.0656622300716585 −26 chr10 44910843 44910843 * 1.80546716538467e−07 0.00787736633553591 −25.9353937202038 chr18 80147494 80147494 *

0.0656161957474253 −25.787728028534 chr18 861938 861938 * 6.06231367941984e−08 0.00357394427630267 −25.6210604375232 chr6 144354406 144354406 * 2.18050500676714e−06 0.0429927546827077 −25.5763212161269 chr7 70809483 70809483 *

−25.5308641975309 chr4 187997035 187997035 * 2.69104557651176e−06 0.049354247370596 −25.0264214753752 chr6 159836629 159836629 * 7.20872854132426e−07 0.0213557664991661 −24.9026544734441 chr2 130193993 130193993 * 4.19993792913393e−07 0.0146509418988426 −24.6996949257108 chr5 150846762 150846762 * 6.71723460796621e−07 0.020518356940058 −24.6074348770578 chr15 81118715 81118715 * 8.71178281447527e−

0.09581934279192 −24.4988625719256 chr19 34838866 34838866 * 7.43877477057478e−06 0.0681277435158626 −24.1642084562439 chr9 39133013 39133013 * 4.68242526760355e−06 0.070077245951675 −23.9316239316239 chr17 58520372 58520372 * 8.65429821377254e−07 0.0242952614641285 −23.7095141700405 chr2 130193992 130193992 * 5.08232778437366e−

0.017378240735756 −29.5759267939756 chr6 109901748 109901748 * 7.57378186133299e−

0.0887675391150375 −22.9825899097725 chr6 1427666 1427666 *  4.1656054954838e−06 0.0653130520338026 −22.8012564249801 chr2 130193975 130193975 * 7.63997909945208e−

0.000100059616296525 −22.7777777777778 chr1 111778211 111778211 * 1.38526468537416e−07 0.00670439793176889 −22.6514654418198 chr12 131804120 131804120 * 1.60355088648069e−

0.0355394316415577 −22.4025974025974 chr15 94715715 94715715 * 8.03990294313617e−06 0.0922541353104587 −22.2307104660846 chr6 27213934 27213934 * 5.30074322138427e−06 0.0753204952188882 −21.8158372897251 chr19 34838821 34838821 *  4.2255574048317e−06

−21.6517657142857 chr6 27214027 27214027 * 8.17644643418618e−

0.0928993827561695 −21.4678272742789 chr5 132948556 132948556 * 1.09679422056492e−07 0.00562520947551778 −21.4397496087637 chr1 148310450 148310450 * 2.92561301205199e−07

−21.4387601152105 chr5 13770123 13770123 * 2.97750526917204e−

0.0522530956076664 −21.0654555962025 chr10 119028861 119028861 * 5.41505577263468e−07 0.018045942850302 −21.010101010101 chr14 54349504 54349504 * 1.32500782359076e−06 0.0371607265944355 −20.9900005847611 chr2 9122698 9122698 *

0.0553922940139813 −20.7370043435617 chr 8 56117913 56117913 * 1.31209250928954e−06 0.0314440676356203 −20.2140367129831 chr3 192571473 192571473 * 1.51012441582648e−08 0.00117140887427228

chr3 192571509 192571509 * 1.23198728176775e−06 0.0301059625728883 −19.57246581581 chr1 109712391 109712391 * 8.36381052244206e−06

−19.2891080054867 chr14 103075201 103075201 * 4.97080244677135e−07 0.0170375612163756 −19.135410044501 chr14 103075202 103075202 * 8.04742970205609e−07 0.0123778715158391 −19.1011235955056 chr3 192571475 192571475 * 4.29345309935699e−09 0.000389406069984431 −19.0929251913166 chr3 128078786 128078786 * 8.93677019482845e−

0.0989577598058528 −18.9655172413793 chr3 192571457 192571457 * 3.18199115275514e−06 0.0545318081291326 −18.9288971897668 chr3 192571464 192571464 * 8.16265991559531e−06 0.0928993827561696 −18.8854401805869 chr 

1229 1229 * 9.25465083046279e−

0.0990711541344773 −18.8479252572811 chr2 74501497 74501497 *

−18.7544483985765 chr9 137435110 137435110 * 3.27132912037723e−06 0.0553922940439813 −18.4988303357945 chr18 21896998 21896998 * 2.57621167333829e−07 0.0101289543974943 −18.4446322907861 chr9 41269669 41269669 * 8.20157694827191e−06 0.0929829554947384 −18.3809956063631 chr2 240647539 240647539 *

0.0871757023007231 −18.3847663010248 chr10 1774890 1774890 * 7.41911120927606e−

0.0882377435158626 −18.2339658729901 chr9 133788456 133788456 * 5.85389135968004e−06 0.0786094346155254 −18.1078276659677 chr20 63007936 63007936 * 2.36283979197505e−06 0.0453738126355073 −18.0519996468615 chr19 49155243 49155243 * 4.41322599883591e−08 0.067492892065868 −18.048303783331 chr12 8321565 8321565 * 1.87307308936261e−07 0.00800175059286567 −17.9609115552061 chr3 192571414 192571414 *  8.5009966803365e−06 0.0949172818272803 −17.9395881550856 chr10 86403169 86403169 * 1.88645415553304e−06 0.039437240590281 −17.8405001834759 chr10 59283484 59283484 * 2.87524309202921e−06 0.0509925697803757 −17.8802242781048 chr17 8888553 8888553 * 8.66305234324867e−08 0.09581934279192 −17.5944490309392 chr14 52043983 52043983 * 6.42607874239494e−07 0.0200445950219647 −17.5354453019117 chr10 119028855 119028855 * 5.48484355960944e−06 0.0764422128269312 −17.3903638151426 chr1 80197189 80197189 * 6.46758553616037e−

0.000544695156969233 −17.2946556247137 chr10 133497580 133497580 * 5.25908496378768e−07 0.0177728876122088 −17.2454876702245 chr3 192571454 192571454 * 1.87361668787714e−06 0.0393124540648982 −17.2451241134752 chr4 16034373 16034373 * 1.74430702931042e−06

−17.1641791044776 chr17 52043983 52043983 *  1.9812424421928e−06 0.0406608739611971 −17.1368951250225 chr11 3123574 3123574 * 5.78086117798307e−

0.0783425368589402 −17.1212121212121 chr11 12193269 12193269 * 2.17360181038483e−06 0.0429927546527077 −16.9700289514095 chr5 502298 502298 * 5.45495798489112e−08 0.00331543744240762

chr1 228211541 228211541 * 7.69335762497404e−

0.0394579677608044 −16.7807869187532 chr2 231592876 231592876 * 1.56145627015732e−06 0.0348686611284209 −16.3443670150987 chr17 3695455 3695455 * 8.02230489421483e−06 0.6922541353104587 −16.1024844720497 chr6 317262 317262 * 2.58873014807748e−06 0.0481062202469341 −16.0769860769881 chr7 73770020 73770020 * 5.54099620562508e−06 0.0755741040514544 −16.0227272727273 chr7 98399974 98399974 * 8.17917720593016e−07 0.0232942453156757 −15.5682309688581 chr12 64320160 64320160 * 6.46807216516429e−07 0.0200445950218647 −15.4077540105952 chr6 12857835 12857835 *  6.8014319507603e−06 0.0846874616877994 −15.3610243238581 chr14 54349491 54349491 * 4.11736743542011e−06 0.0649018977494697 −15.0819672131148 chr4 80197333 80197333 * 4.40212680119678e−09 0.000393213106631316 −14.9885730221029 chr2 216810316 216810316 * 8.65636492696555e−

0.09581934279192 −14.9299719867955 chr3 51717773 51717773 * 5.93012904896357e−06 0.0790954038425797 −14.4593068071349 chr7 98626784 98626784 * 6.95479892954345e−06 0.0857039652361363 −14.3399372692262 chr4 80197401 80197401 * 6.51606438527839e−06 0.0829440737732974 −14.3055230060152 chr4 106681728 106681728 * 1.01959742093521e−06

−14.3023036509735 chr4 80197434 80197434 * 2.47665753584192e−07

−14.2822601644162 chr4 80197448 80197448 * 3.90192206510683e−09 0.000371019105496151 −13.8938431904116 chr15 81134210 81134210 * 1.46473540854918e−07 0.00885927281633657

chr2 238139939 238139939 * 6.72797903159993e−06 0.0842118288254895 −12.7743498146194 chr12 8321458 8321458 * 3.16414431293741e−07 0.0116585804267766 −12.7116456741669 chr15 81134204 81134204 * 3.14989784152142e−07 0.0116585804267766

chr13 51263036 51263036 * 8.24457527543057e−06 0.0932910405302952 −12.5668418171866 chr4 80197386 80197386 * 4.41138286705164e−06 0.057402892065868 −12.3397435897436 chr4 184592009 184592009 * 4.08890824143861e−06 0.0647997880208238 −12.3244719732236 chr17 41522424 41522424 * 4.85064838768374e−

0.0714905096742304 −12.2570026063612 chr4 80197352 80197352 *  7.9042025396316e−06 0.0913686602537442 −12.2314622314622 chr15 81134172 81134172 * 5.82590267260604e−06 0.0785643200074141 −11.8460725623563 chr10 133178861 133178861 * 4.13593463961443e−06 0.0650207227971104 −11.4820298828914 chr5 141413367 141413367 * 7.76408812915108e−

0.0901023497242646 −11.0053151407691 chr6 27601892 27601892 * 1.12696056779619e−07 0.00598198535935319 −10.8045212765957 chr15 81134247 81134247 * 1.61260371448941e−

0.0356062114996405 −10.2122216407931 chr21 17454767 17454767 * 1.57513331385615e−06 0.0350413488922337 −10.1555164319249 chr10 89537766 89537766 * 4.24059356977105e−

0.0656161957474253

chr7 103969589 103969589 * 6.37032458331714e−06 0.0619984146752947  −5.3475935828877 chr17 48735209 48735209 *

0.049354247370586    2.86168521462639 chr14 90062594 90062594 * 6.96526939592211e−06 0.0657039652351353

chr5 176630707 176630707 * 4.70708171486644e−06 0.0701633113531462    4.59459459459459 chr2 236169565 236169565 *  8.7670482328519e−06 0.0956901512029661    4.7652326965885 chr13 87677946 87677946 * 7.09937990209725e−07 0.0211380343473178    6.23052959501558 chr11 17544470 17544470 * 3.25102908072152e−

0.0553922940139513    7.59020846867064 chr17 9115741 9115741 * 9.15941125734579e−07 0.0250713348175674    7.77777777777778 chr17 52160362 52160362 * 6.28550414474443e−

0.081440051545509    7.9088253177322 chr10 99236564 99236564 * 4.21794729967644e−06 0.0656161957474253    8.84050589932943 chr4 25245665 25245665 * 8.05135851458758e−08 0.0922541351304587    9.07484890748469 chr2 102620063 102620063 *

0.0882897675169175    9.13461538461538 chr1 89843771 89843771 * 6.11840110519317e−06 0.0740092321987059    9.32642489046632 chr10 29409440 29409440 * 1.43616184275359e−

0.0334048397934732    9.69897818282242 chr1 34177008 34177008 * 2.0333104672896e−06 0.0414358834417034   10.2297892606263 chr7 1647690 1647690 * 8.90170165376592e−06 0.0959577598068528   10.605243526773 chr3 9422752 9422752 * 6.82295671075496e−06 0.096322946992369   10.7162924836528 chr10 38402836 38402836 * 7.60787694357665e−

0.0659675592264456   10.7569721115536 chr2 23381331 23381331 * 6.45880698427465e−06 0.094625420271301   11.0674009965906 chr13 114115441 114115441 * 2.00688445021296e−06 0.0412235620765496   11.1227757197731 chr5 39455822 39455822 * 3.97383310973769e−

0.0631458897627837   11.6715875539405 chr17 7767401 7767401 * 9.85317560721615e−07 0.0258676453893068   12.2577519379845 chr11 1299404 1299404 * 7.02662324507093e−06 0.0857039652361363   12.4037322823389 chr7 56229631 56229631 * 9.25954077480037e−06 0.0990711541344773   12.6457762904371 chr6 160679141 160679141 *   3.391955445635E−06 0.0554880066848954   12.6694944043794 chrX 128165501 128165501 * 7.32783881549843e−06 0.0878051642006674   12.8392662662035 chr6 160679196 160679196 * 7.50383914366681e−

  12.9055696139103 chr17 9115755 9115755 * 3.61917824126845e−06 0.0592675321864645   13.1250042727435 chr3 67648882 67648882 * 9.72353899380297e−07 0.0259382970300902   13.2977588171549 chr17 2376166 2376166 * 6.63550000349147e−06 0.0634084755830776   13.3778047301395 chr13 110719297 110719297 * 8.55748057908093e−06 0.0953673297910076   13.5954770528915 chr5 16617985 16617985 * 6.09043427333241e−07 0.0195137083815149   13.8078108400913 chr2 120466164 120466164 * 8.97978301427997e−06 0.0959577598056528   13.8459877890251 chr13 36675352 36675352 * 6.63185693860932e−06 0.09581934278192   14.094129932125 chr7 98293439 98293439 * 2.55905278327878e−06 0.0478936639644374   14.2638830112381 chr5 141340409 141340409 * 2.91719063513637e−06 0.0514905720337703   14.4652837607971 chr1 108640932 108640932 * 4.52019430927724e−06

  14.5024071753611 chr20 20365677 20365677 * 6.42058331347275e−06 0.0824650297766311   14.5833333333333 chr1 17579648 17579648 * 5.14707366002753e−06 0.0740092321987059

chr6 164040155 164040155 * 2.31449394623397e−06 0.044590669746554   14.7955809109636 chr6 164040129 164040129 * 6.14142579203506e−06 0.0803391129964423   14.9056603773585 chr10 1644256 1644256 * 7.26996022323746e−06 0.0672890798211452   14.9419582540602 chr14 92323937 92323937 * 6.33397268615149e−06 0.0817088990664582   14.9422345715121 chrX 102769280 102769280 *  2.3819544205987e−06 0.0455923631416016   15.2221105527638 chr6 75493950 75493950 * 1.16229603673797e−06 0.0290345349321572   15.2689429546563 chr10 72326102 72326102 * 1.66613645697655e−

0.0366509424001043   15.3494809688581 chr14 

2878 2878 * 3.78860341767504e−06 0.0615292686847942   15.3676072452571 chr2 52572195 52572195 * 2.29713811045743e−06 0.0444013976002324   15.5818353831599 chr7 158154286 158154286 * 6.31946156301339e−06 0.0937784182927845   15.5825201041698 chr5 140876767 140876767 * 6.47272912187058e−06 0.0764422128269312   15.5911587945897 chr14 105137166 105137166 * 9.30176934484768e−06 0.0993426775395648   15.6930154958416 chr7 158671573 158671573 * 6.14602328945107e−06 0.0803391129964423   15.8153638814016 chr6 15504589 15504589 * 1.80400817128343e−06 0.0384376186546574   15.8588076338458 chr19 39336029 39336029 *

0.0802319051571692   15.9090909090909 chrX 25006255 25006255 * 1.46199983883199e−06

  16.0092005171533 chr12 125142124 125142124 * 2.11915250516777e−06 0.00150519751372088   16.1316966708088 chr6 157511117 157511117 *  6.1711967992935e−06 0.0804897308137579   16.2272625196497 chr6 150679242 150679242 *  1.8335930004598e−06 0.0387443261450957   16.2345679012346 chrX 104584408 104584408 * 1.10803256096536e−06 0.0279407196102626   16.2540682414698 chr17 74270497 74270497 * 4.93447428179247e−06 0.0721847100589141   16.3354893651534 chr5 142426222 142426222 * 1.51460789516886e−06 0.0342027277222956   16.354156661861 chr6 33908181 33908181 * 8.37424153186831e−06 0.0936883143768792   16.362760747395 chr12 34361526 34361526 * 5.63398219765226e−

0.0776034580536978   16.4900551562761 chr15 24773347 24773347 *  8.0890471917745e−06 0.0922541353104587   18.4955259862353 chr3 11582396 11582396 * 5.30314452351505e−06 0.0753204952188082   16.5818661497873 chr2 64636343 64636343 *   1.771487478579E−06 0.0381150853306115   16.6337683763957 chr1 28612745 28612745 * 2.24757603314162e−06 0.0438749689544513   16.6666666666667 chr6 158869351 158869351 * 3.03955849810629e−06 0.0528608695986296   16.7341462047384 chr12 116726927 116726927 * 9.38518755291245e−07 0.0254999228716589   16.9503206622 chr20 19975153 19975153 * 5.83948863226354e−

0.0785977438122122   17.0103092783505 chr7 928847 928847 * 2.30726259098789e−07 0.00944591221805046   17.0478536242083 chr1 153452347 153452347 *  9.0960369712121e−07 0.0250580669445832   17.072192513369 chr6 151616062 151616062 * 1.07739679765797e−

0.0275235483198223   17.0825967904059 chr17 75828646 75828646 * 5.31492486512653e−06 0.0753204952188082   17.1014219794708 chr2 4380984 4380984 * 5.70066406879171e−

0.0780769936964471   17.1465968588387 chr1 232767857 232767857 * 4.75809525975101e−06 0.070628341303741   17.1577275935074 chr11 3156499 3156499 * 8.99722307684049e−07 0.0209396233880114   17.1949313765457 chr5 41869578 41869578 *

0.0342027277222966   17.3694779116466 chr22 26757154 26757154 * 3.84822502918888e−06 0.0617577667911084   14.4166713718405 chr17 6995809 6995809 * 5.51834419781712e−

0.0765741040514544   17.5181937855493 chr20 19975175 19975175 * 3.55499472990289e−06 0.0585416969719241   17.563025210084 chr12 34361520 34361520 * 6.34375976257521e−07 0.0198929069601162   17.6046176046176 chr1 2955499 2955499 * 4.95345048031268e−06 0.072207374487384   17.6112412177966 chr17 9115517 9115517 * 8.45813208293672e−

0.0047945791923793   17.7335428571207 chr17 6996074 6996074 * 7.53937521614787e−

0.0885403049266502   17.8347754200365 chr2 24304224 24304224 * 9.86884626589841e−07 0.0262073302819578   18.042328042328 chr20 19975167 19975167 * 6.47133536675212e−08 0.0825774316790541   18.0620456905504 chr10 133170251 133170251 * 1.15487541922153e−06 0.0289718780343888   18.1174854002061 chr8 19652708 19652708 * 7.19644872008835e−

0.0870543809739352   18.1283787330079 chr15 100553730 100553730 * 8.72606758944996e−

0.09581934279192   18.1361484252569 chr14 95071900 95071900 * 6.82343982140052e−06 0.0848874818877994   18.1432838035877 chr2 231924719 231924719 *  2.7317235815661e−06 0.0495141213330146   18.1804306504307 chr13 99566717 99566717 * 2.79109069942135e−06 0.0499222052083915   18.1597804949519 chr20 19975163 19975163 * 2.81454084761736e−06 0.0501289877413247   18.428826543385 chr20 19975160 19975160 * 1.10903198558053e−

0.0279407196102626   18.4659090909091 chr20 19975171 19975171 *

0.0105227571945974   18.7731624142323 chr5 1410033 1410033 * 1.49557110781536e−07 0.00654244932579815   18.8241742832741 chr5 50377869 50377869 * 6.19047972859498e−

0.0805629715155258   18.8826205641492 chr3 128434766 128434766 *  2.5680972780805e−06 0.047910837291266   19.0173796791444 chrX 43078653 43078653 * 1.47290098991755e−06 0.0336560559387906   19.0642663443805 chr2 91559538 91559538 *  1.2798441911153e−06 0.0307964310716694   19.0743405275779 chr17 8995833 8995833 * 5.80912520858866e−06 0.0785477267285294   19.1361189435605 chr11 45930246 45930246 * 6.67636406131693e−08 0.09581934279192   19.1517314890558 chr12 4911898 4911898 *  5.1999302791302e−06 0.074406294590844   19.4045661303299 chr7 928902 928902 * 1.42073307431449e−09 0.000157514289349215   19.5591142177787 chr3 184017372 184017372 * 3.09142055769524e−07 0.0116082782869539   19.6161648515909 chr17 6996045 6996045 *  1.3474988752516e−06 0.0319034805099525   19.7234678624813 chr13 112980491 112980491 * 6.05595949101343e−08 0.079570207197914   19.8325344694952 chr3 158801256 158801256 * 1.38068648098188e−07 0.00670439793176869   19.8975199973408 chr14 33475011 33475011 * 6.52813522959558e−07 0.0200445950218647   19.9644633972992 chr8 48331785 48331785 * 1.53438473949785e−07 0.0070002592590762   19.9948407807618 chr18 86681145 86681145 * 2.79446277187232e−

0.0499222052083915   20.0137879832423 chr11 45930309 45930309 * 3.87504524032989e−

0.0520761359197938   20.031746031746 chr16 87954981 87954981 * 3.76282271941594e−07 0.0134443306851155   20.1140964298859 chrX 107717006 107717006 * 2.28477009577724e−06 0.04430760700502   20.1513167466899 chr11 72103654 72103654 *    1.32947831334E−06 0.0316037285944355   20.1845419018671 chr8 1515412 1515412 * 3.27940881714947e−

0.00217231402944738   20.1887816091954 chr7 928884 928884 *   5.776099753223E−07 0.0187099518416535   20.5057107427951 chr7 45548154 45548154 *  3.8464420181436e−06 0.0617677667911084   20.2754976958525 chr17 81255325 81255325 * 2.94355442400736e−

0.00197198303895888   20.2867383512545 chr5 56585077 56585077 * 1.79517057959436e−07 0.00787738633553591   20.332669941025 chr11 1299316 1299316 *  6.4988054748583e−07 0.0200445950218847   20.3922148842589 chr16 86681244 86681244 * 9.56783330012095e−06 0.000794446441441805   20.4052181944124 chr5 142426207 142426207 * 3.34377581110171e−06 0.012243924656305   20.4150688021656 chr5 169299958 169299958 * 3.68583716161777e−06 0.0600256603680437   20.6111957349581 chr10 46911213 46911213 * 2.78829720129378e−06 0.0499222052083915   20.6247827598193 chr5 140842538 140842538 * 7.94033601486232e−06 0.091606712737502   20.5908789778777 chr15 21955106 21955106 * 4.91225838152047e−06 0.0721847100589141   20.7339284509702 chr1 15066339 15066339 * 8.97309417940359e−06 0.0989577598056528   20.8438038670017 chr16 30398862 30398862 * 7.03617940967512e−06 0.0857039652361363   20.9139816965438 chr12 90954040 90954040 *  7.8885532383007e−06 0.0913669129032541   20.9183673469388 chr7 4706798 4706798 *

0.0571220483819076   20.0357900614152 chr18 14748232 14748232 * 1.98227666222751e−06 0.0406508739611971   20.9880300263745 chr5 671276 671276 * 6.75153422827891e−06 0.0958901512029551   21.0102040816327 chr10 133565671 133565671 * 3.12406149733364e−06 0.0538946630166299   21.0105134703839 chrX 19344014 19344014 *  2.8014511627033e−07 0.0108654780971641   21.0144057623049 chr2 109130298 109130298 * 2.61302838066889e−06 0.0484424812103089   21.0662740226077 chrX 15645314 15645314 *  3.6550945343122e−09 0.000353246731107763   21.113030279239 chr20 61394416 61394416 * 5.58178210281432e−07 0.0184393499525971   21.1275210398152 chr13 112689062 112689062 * 7.01003959212573e−06 0.0857999652361383   21.2597245433066 chr12 75391126 75391126 *  6.3123204346952e−06 0.0816081564423863   21.2828770799785 chr8 8893931 8893931 * 2.05853107488063e−06 0.0414358834417034   21.2893700787402 chr10 69337894 69337894 *

0.0825774316790542   21.2999195683804 chr6 41198259 41198259 * 8.88973999706272e−06 0.0853306691228229   21.4809364164223 chr10 121527085 121527085 * 6.98851969048408e−07 0.0209396233880114   21.7685862007488 chr15 101065769 101065769 * 7.17708053664482e−

  21.8864666522771 chr22 25757078 25757078 * 3.13168016121987e−

0.0538646630177199   21.8995765275257 chr7 928936 928936 *

0.00513966443612768   21.9341430499325 chr5 82334505 82334505 *  3.9636195228669e−06 0.0631458876257637   22.0043231728329 chr7 1024437 1024437 * 7.62328118855421e−08 0.0889675592264456   22.1361921778837 chr17 9116060 9116060 * 1.55088676018667e−06 0.0347639130761953   22.222456957173 chr16 8644774 8644774 * 1.82713707034827e−06 0.0387443261450957   22.4154068445032 chr17 9116020 9116020 * 1.07750438072861e−07 0.00582520947551778   22.4231601731602 chr19 29671725 29671725 * 4.56454934144679e−

0.0691703554407462   22.4514991181658 chrX 47145243 47145243 * 7.64848063702281e−07 0.0224330589636318   22.4900817308735 chr7 127251316 127251316 * 5.35818201189688e−06 0.075751420262087   22.5546975546976 chr2 3316654 3316654 * 6.16402809086827e−08 0.0928993827561695   22.6568941823179 chr13 18600074 18600074 * 5.18309937342403e−08 0.0743458964634313   22.8951956500997 chr6 13877799 13877799 * 8.95126736806239e−06 0.0959577598056528   22.9665071770335 chr19 37294559 37294559 *  1.6220954636522e−09 0.000150430483497275   23.0094030365769 chr7 108415322 108415322 * 1.24614314085159e−

0.0302403896561507   23.0595268218623 chr2 168499447 168499447 * 5.43835452559394e−

0.0761543594497282   23.0726878669358 chr20 45304879 45304879 * 6.90524030830571e−06 0.0853433705154194   23.2261256803563 chrX 64205954 64205954 * 1.52003175779843e−06 0.0342027277222966   23.3647941417225 chr19 7862084 7862084 * 5.52098871833697e−06 0.0765741040514544   23.3613445378151 chr5 25710448 25710448 * 1.23582956553838e−06 0.0301959625728663   23.4370850374462 chr4 1218866 1218866 * 8.37698638800132e−06 0.0938883143768792   23.4448114505918 chr3 184017291 184017291 * 3.61320552585674e−07 0.0129884758275872   23.4491540840458 chr17 6995830 6995830 * 1.36074437699724e−

0.0320882411227902   23.4804754402543 chr17 49014770 49014770 * 2.06640079572259e−06 0.0414358634417034   23.4702790000165 chrX 153781252 153781252 *  5.5987266684809e−07 0.0184393499825971   23.5335132703554 chr22 

137400 137400 * 1.54261018011994e−07 0.0070002592590752   23.7164750957854 chr3 13933198 13933198 * 9.59812325404932e−07 0.0258614954261138   23.7825446209268 chrX 63351268 63351268 *  2.7521979719894e−06 0.0496182325919689

chr2 131830061 131830061 * 1.92095230523419e−07 0.00814725209338524   23.9108409321176 chr11 15069378 15069378 * 4.77078686740945e−10 6.77764433307093e−05   23.9130434782609 chr5 141246131 141246131 * 6.59054316005384e−

0.083376692804708   23.9220696263175 chr2 239231927 239231927 * 5.13244577684371e−

0.0740092321987059   23.9330024813896 chrX 49234431 49234431 * 1.67470205968947e−06 0.0367024154776301   24.0277562075934 chr18 24870694 24870694 * 6.81210387660693e−06 0.0846874816877994   24.1156319910515 chr16 14337567 14337567 * 6.81187505369222e−07 0.020700189258845   24.1374353078481 chr19 19514983 19514983 * 3.26632886702103e−06 0.0553922940139813   24.1576605212969 chr2 239232124 239232124 * 8.27114020935035e−07 0.0234429046248908   24.1879597913497 chr2 131830062 131830062 * 5.40964187031731e−06 0.0761138805996409   24.1937856223571 chr18 14458892 14458892 * 1.18500214028664e−06 0.0295329361879514   24.1952309985097 chr7 27411345 27411345 * 6.65982139582401e−08 0.0543084755630776   24.3007477153143 chr2 3595133 3595133 * 1.06394407915843e−06 0.0273900416385782   24.3924050632911 chr20 19975301 19975301 *  4.7967037772016e−06 0.0709023155809478   24.4444444444444 chr13 99586817 99586817 * 5.90848886095403e−08

  24.5086431445839 chr11 45930252 45930252 * 7.45553160300389e−08 0.00426727829797806   24.5537163258682 chr5 68225408 68225408 * 1.09193639577545e−07 0.00582520947551778   24.8198276717996 chr17 16689974 16689974 * 2.03999637741829e−07 0.00852942306639986   24.6512136453094 chr17 78140883 78140883 *  8.8705310348995e−06 0.009581934279192   24.7621554544429 chr1 93592170 93592170 * 3.27918112553502e−06 0.0553922940139813

chr11 58830739 58830739 * 7.47083099100034e−06

  24.8744939271255 chr17 4946839 4946839 * 8.91482794177763e−08 0.0248428819074298   24.9246868558744 chr9 63781683 63781683 * 4.23812400795643e−08 0.0586161957474253   25.0008648424257 chr11 45930240 45930240 * 1.53082434776836e−07 0.0070002592590762   25.0130982045876 chr1 93592233 93592233 * 2.83100456694509e−07 0.0109083381311835   25.0330667830588 chr18 1459220 1459220 * 3.38616499749607e−06 0.0564880066848964   25.1025464025641 chr2 131263832 131263832 * 2.08273921361247e−06 0.00150519751372088   25.1808406647116 chr3 2083862 2083862 * 2.53773885194839e−09 0.000267158055850311   25.2170666612228 chr18 14459023 14459023 * 2.84654382432768e−06 0.0505462758889119   25.2670716360394 chr7 89228935 89228935 * 1.81723746139605e−07 0.00787736633553591   25.5062571103527 chr2 11511040 11511040 * 9.25184599489317e−06 0.0990711541344773   25.5118574164089 chr9 55376788 55376788 * 5.75437260355062e−07 0.0187099518416835   25.6744186046512 chr1 16353352 16353352 * 4.65527420517651e−06 0.0700113694379202   25.8187134502924 chr17 6995802 6995802 * 4.79049616505616e−06 0.0709023155809878   25.8361204013378 chrX 15231803 15231803 * 2.73807433589792e−06 0.0495141213330146   25.9012345679012 chr7 928856 928856 * 6.84664612433188e−08 0.00395719163619035   25.9324155193992 chr16 88937576 88937576 * 1.25160657680843e−06 0.0302403896581507   25.9938328194016 chrX 23908198 23908198 *  2.2949207621491e−07 0.00944591221805040   26.234796404019 chr13 99565890 99565890 * 3.58056479295213e−06 0.0587985289984288   25.4184907834101 chrX 52995515 52995515 * 3.12651175527464e−06 0.0538646630166299   26.5151515151515 chr12 121654239 121654239 * 6.73629885054117e−

0.00393196229897439   26.7460319460317 chr10 94383771 94383771 * 1.57870254186226e−

0.00017950025742979   26.8034215111674 chr2 131253902 131253902 * 1.61220629993114e−

0.00121852914927674   27.1579077819275 chr17 44210845 44210845 * 1.94982343283369e−06 0.0404749497369219   27.1787858065497 chr1 1509036 1509036 * 1.45502579270725e−

0.0335073513357785   27.1883269124668 chr22 43129256 43129256 * 8.63505007185784e−06 0.0242952614341285   27.1971829380967 chr20 45304619 45304619 * 5.75997712730997e−06 0.0783485368589402   27.2538272538272 chr2 130455808 130455808 * 2.89043106423357e−07 0.0110649961709326   27.5730778678457 chr21 44921095 44921095 * 2.83830583449049e−07 0.0248428619074298   27.7561770977048 chr1 53146300 53146300 * 5.99018904485462e−06 0.0792393419764003   27.8161886227838 chr16 33029245 33029245 * 2.62618153607755e−06 0.0485487901506917   27.8814732761008 chrX 107776374 107776374 * 1.38742731797769e−07 0.00670439793176869   28.1957186544343 chr9 135485006 135485006 * 7.06210067890364e−06 0.0858423382496341   28.2051282051252 chr22 40536393 40536393 * 2.31951733013113e−10

  26.2352941176471 chr19 33442882 33442882 * 1.73077443193989e−09 0.000188953997315065   28.3827780313403 chrX 64395518 64395518 * 5.54626257453056e−06 0.0785741040514544   28.5048247625287 chr7 928941 928941 * 1.62734713385636e−07 0.0073234931621066   28.8420376712329 chr12 2106958 2106958 * 1.85235829723081e−08 0.00136231581398794   28.9473684210526 chr13 99565943 99565943 * 8.72358591444338e−06 0.09562934279192   28.989080662205 chr2 131830531 131830531 * 4.93224498006621e−06 0.0721847100589141   29.6087533156499 chr9 63781708 63781708 *  2.0943910600261e−06 0.00150519751372088   30.2474247292239 chr4 1883408 1883408 *  7.2163450433967e−06 0.0870543809739352   30.3359462486002 chr19 37294432 37294432 * 3.13392568998976e−06 0.0538646630168299   30.3885590933621 chr12 120103765 120103765 * 1.29292545702216e−07

  30.6754560231172 chr6 44311573 44311573 * 6.96283160484396e−16 2.56551927582013e−10   30.7361762460485 chr2 89649492 89649492 * 3.80279779206735e−06 0.0615901443820628   30.8474576271186 chr1 53146299 53146299 * 1.01516567163954e−06 0.0286055408882385   31.2574671445635 chr8 142832875 142832875 * 4.57395088049963e−06 0.070077245951875   31.3136979897148 chr18 76767817 76767817 * 9.60701295160221e−07 0.0258614954261138   31.5458937198068 chr18 76349531 76349531 * 2.52388135007034e−06 0.00173013445877657   31.6992645216443 chr14 52122802 52122802 *  3.0710009730901e−07 0.0116055233164768   31.7062085264976 chr16 33029286 33029286 * 1.70183579048149e−07 0.00760069130447552   31.7708333333333 chr5 33502742 33502742 * 7.49883997457326e−06 0.0882987675169175   31.8063305000752 chr2 3125150 3125150 * 7.38867223422674e−06 0.068175881558769   31.8300065935383 chrX 154805593 154805593 * 6.62749295173053e−06 0.0834084755630776   31.9047619047619 chr7 151174648 151174648 * 1.51846708716497e−06 0.0342027277222966   32.2258592471358 chr14 100883428 100883428 * 5.71978412508625e−06 0.0781769936084471   32.367244905659 chr14 99040860 99040860 * 3.52457275698086e−07 0.012747594857546   32.4862385321101 chr4 16256532 16256532 * 2.24632470908301e−06 0.0438748689544813   32.5367813661219 chr 

1769483 1769483 * 9.05358605864304e−06 0.0975759257953621   32.6356641697454 chr13 110639275 110639275 * 2.57719038357563e−07 0.0101289543974943   32.861744866443 chr17 3697140 3697140 * 4.32888016037048e−

0.00280442315019776   32.8896916690433 chr7 1759295 1759295 * 1.85698686464837e−06 0.0390985069340864   32.9693295539253 chr14 100883427 100883427 * 6.79411347487157e−

0.0646874616877994   33.0891866781425 chr14 23580472 23580472 * 1.17801547070557e−06 0.0293029956589775   33.1623931623932 chr7 89228934 89228934 *  5.7213360363665e−06 0.0780769936064471   33.2051282051282 chrX 154398474 154398474 * 7.01302409398192e−06 0.0857039652361363   33.3094213295074 chr11 8313237 8313237 * 1.48263199504415e−

0.001165417344545   33.5684729064039 chr9 63781672 63781672 * 8.12820452220173e−11 1.45207835804088e−08   33.9124889054726 chr2 240913913 240913913 * 6.46215759307487e−05 0.0825774316790542   34.0201465201465 chr18 14179615 14179615 *   7.034048053248E−06 0.0857039652361363   34.3085105382979 chr10 112842985 112842985 *  8.1277498359986e−06 0.092960282194563   34.3089030988867 chr6 170168066 170168066 * 5.95588546571617e−06 0.0792393419764003   34.4703389830508 chr7 1759296 1759296 *

7.75922693444485e−09   34.5021037868163 chr7 158973450 158973450 * 2.98695163460174e−06 0.0522530956076654   34.8612172141564 chr10 103456497 103456497 * 1.82355256133036e−12 4.47968395537135e−07   35.2497195764178 chr4 16360900 16360900 * 1.76438224548031e−06 0.0370817576772098   35.9553123575011 chrX 142971347 142971347 * 6.97291165033871e−07 0.0209396233880114   35.9930110658125 chr12 5889121 5889121 * 4.19457086012719e−07 0.0146509416988426   35.993265993266 chr12 5889122 5889122 * 3.24681426757704e−10 4.90797352813289e−07   36.144576313253 chr12 132849363 132849363 * 1.42672418428752e−07 0.00678909204811647   36.3135018465184 chr7 2714483 2714483 * 2.63523298089516e−06 0.0485487901506917   36.3550063907701 chr20 18227868 18227868 * 1.89319155567353e−06 0.0394382373080044   36.5421455939697 chr7 158973719 158973719 *  6.2576347272967e−06 0.0612575515670304   36.7965367965368 chr13 110539274 110539274 * 3.65465862545053e−11 7.42947613146915e−

  37.1058067648038 chr7 1769484 1769484 * 9.76538684309593e−10 0.000125152665483359   37.4363861993184 chr18 1458079 1458079 * 2.73473845547679e−06 0.0495141213330146   37.5944737911418 chr7 158973714 158973714 * 4.24186387872778e−07 0.014710151948704   38.110861288556 chr12 13095512 13095512 * 2.26280767494508e−

0.00156809957596004   38.4369114877589 chr7 34939470 34939470 *  5.6620661324981e−07 0.0185443646548435   38.6921850079745 chr18 14458993 14458993 * 2.39376120192636e−06 0.0456700741763526   38.9165835825856 chr12 132843222 132843222 * 3.38268172042491e−07 0.012309927489174   39.1886597570966 chr11 50279012 50279012 * 4.30883976796929e−06 0.0683240355080424   39.314606741573 chr7 1764018 1764018 * 1.22055159642621e−06 0.029981563301402   39.3270049898556 chr2 9245093 9245093 * 5.43430493275317e−06 0.0761543594497282   40 chr7 100847116 100847116 * 3.64229197056794e−06 0.0594808172903373   40.0865800685801 chr16 8902070 8902070 * 1.16752761648575e−07 0.00603770316698126   40.1101946516082 chr6 59258597 59258597 * 1.15209360303514e−07 0.00601061200563929   40.3251231527094 chrX 100406950 100406950 * 2.06158285185436e−06 0.0414358834417034   40.3743315508021 chr1 151852674 151852674 * 2.25762286037092e−10 3.6957728469288e−05   40.4853833425262 chr13 110569482 110569482 * 2.34599317702309e−06 0.0016278046100216   40.655443109808 chr4 141247044 141247044 * 6.53851793505649e−06 0.0829440737732974   40.8432147562582 chr16 88006372 88006372 * 1.03047403394103e−06 0.0286503210330817   40.983606557377 chr11 76025058 76025058 * 3.34086790592703e−06 0.0552595792448322   41.2037087037037 chr11 15072035 15072035 * 2.46322768519033e−06 0.0457065024985586   41.6014924302217 chrX 153606805 153606805 * 6.09115908826083e−16 2.39396643050005e−10   41.5272655834358 chr2 90245085 90245085 * 6.72528026354226e−06 0.0842118280264895   41.8832891246654 chr16 5185557 5185557 * 80.9899999484523e−07 0.0231778715158391   42.0454545454545 chr6 118688131 118688131 * 7.63612132900429e−06 0.0889675592284456   42.2027792074112 chr2 90245076 90245076 * 1.00357097038975e−06 0.0255309377110954   42.3719958202717 chr10 103458498 103458498 * 8.87300946042895e−32 1.74364899011705e−25   43.0265356767126 chr19 7554442 7554442 * 1.07846741577602e−06 0.0275235483198223   43.2249322493225 chr8 71668017 71668017 * 2.09855280766445e−06 0.0417962736452911   43.4541203974284 chr17 40014144 40014144 * 3.36438847291582e−06 0.0564880066848964   43.466172381835 chr8 44311572 44311572 *  4.0038430934445e−13 1.02921153475302e−07   43.5798882855708 chr8 23344333 23344333 * 4.25627604564246e−09 0.000389406859984431   43.7608434538903 chr16 8902069 8902069 * 6.27942692393337e−06 0.60916294830927e−05   43.8190490844745 chr 

1489565 1489565 * 2.52602036762017e−06 0.0474635582490079   43.9298245614035 chr13 114103908 114103908 * 1.30173061753499e−07 0.006503520677028   44.4522144522144 chr17 1517860 1517860 * 2.98607382645789e−06 0.0522530956076884   44.4577831132453 chr12 131585340 131585340 *  6.1255332108104e−07 0.0195200777501359   44.4612455197133 chr11 3617050 3617050 * 1.56072555590698e−08 0.00119493749660254   44.5192307692308 chr2 194200772 194200772 * 1.83775461270924e−07 0.0079081760166216   45.2539912917271 chr4 4037085 4037085 * 4.62505552560616e−06 0.0699142129015274   45.7295968934993 chr4 87927456 87927456 * 5.15240314806726e−11 9.49225138804652e−06   45.890823738383 chr5 127476033 127476033 * 4.65493880975636e−06 0.0700113094379202   46.9618445215282 chr17 1526574 1526574 *  1.1458599113541e−14 3.75291210601205   47.354790972993 chr14 57366558 57366558 *  3.2636021578232e−09 0.000326442866774403   47.4576271186441 chr11 1872395 1872395 * 1.42527266706343e−06 0.033343162468073   47.7839514149115 chr1 15219581 15219581 *  6.8864757723842e−19 4.51090714281087e−13   47.8225778225478 chr5 92374220 92374220 * 3.15624406540522e−07 0.0116585804267786   48 chr5 1705803 1705803 * 1.11374371059014e−09 0.000136769700895759   48.206599713056 chr1 15219580 15219580 * 1.94284510722883e−08 0.00143171825929577   48.250055029716 chr11 71589734 71589734 * 1.19350487339216e−08 0.0294398551248445   48.2698412898413 chr5 17095802 17095802 * 3.84338065355121e−19 2.8325578257214e−13   48.2810474462017 chr5 1926307 1926307 * 5.99468471007297e−06 0.0792393419764003   48.936170212786 chr19 3831767 3831767 * 9.92393775997153e−08 0.00541713487976249   49.8668280871671 chr7 34939471 34939471 * 1.98497366394687e−10 3.34345958133233e−05   50.2493287303414 chr1 153348938 153348938 * 4.40575383775038e−08 0.00282386085625328   51.3026452489411 chr9 63703081 63703081 * 5.77752168001692e−06 0.0783425388589402   51.8048128342246 chr9 122620923 122620923 *   7.429894660979E−06 0.662377435168626   52.6705276705277 chr12 95638101 95638101 * 5.71858504259934e−06 0.00344016660304417   52.7537633758539 chr10 111583910 111583910 * 4.95882892516532e−06 0.00312952931851425   52.7851458885942 chr10 131724260 131724260 * 1.23715424482524e−18 6.07802535514812e−11   52.8746177370031 chr19 3623239 3623239 * 8.97578013043069e−07 0.0248428819074295   53.0175792184528 chr10 132803598 132803598 * 2.13628040558972e−06 0.0424044251560342   53.2599928859333 chr11 71589733 71589733 *  0.0482592293217e−08 0.00357394427630267   53.3441316769469 chr16 30710079 30710079 * 1.01993186164307e−06 0.0266055408862385   53.4782608695652 chr2 130697316 130697316 * 2.65157858657125e−08 0.00179677883512725   54.689366786141 chr22 50471299 50471299 * 1.39941300365533e−11 3.17305663571782e−06   55.8558558558559 chr11 17567503 17567503 * 2.08475930379464e−06 0.041662902941552   56.7846100996221 chr19 3823238 3823238 * 1.78037894556353e−06 0.0381670970021494   57.0016939582157 chrX 144231760 144231760 * 7.22087983908766e−06 0.0570543809739352   57.133152173913 chr11 71590085 71590085 * 5.33441584465839e−06 0.00327885938881253   57.7435257561811 chr5 149427683 149427683 * 5.39341347686118e−07 0.018045942850302   59.3939393939394 chr10 132803137 132803137 * 5.97330999592518e−06 0.0792393419764003   60.3435571520678 chr5 150098459 150098459 * 3.09372760861357e−18 1.82385977762572e−12   61.1111111111111 chr22 50471298 50471298 * 3.39559018818451e−09 0.000333734627440756   61.2155745489079 chr22 37103346 37103346 * 5.92202191736626e−07 0.0190778000789806   61.7066287653523 chr11 71612643 71612643 * 9.79275135454918e−08 0.00539548292708294   63.1568828505511 chr11 71602277 71602277 * 2.51302646524811e−07 0.010078342199198   63.519882179676 chr11 71590064 71590064 * 4.08725788966055e−09 0.000352473072032365   64.4620720856583 chrX 144230634 144230634 * 5.29944949997185e−06 0.0753204952188082   64.872325741891 chr10 131659639 131659639 * 5.35757653519535e−10 9.94851868850657e−05   66.2058371735791 chr10 131659638 131659638 * 7.45230122527055e−10 9.95497765261251e−05   67.9128856624319 chr14 90556073 90556073 * 1.05901238808213e−11 2.49729819730085e−06   69.3899018232819 chr11 71602082 71602082 * 1.86133735136998e−11 3.91901056085249e−06   73.0769230769231 chr14 90556042 90556042 * 4.88075839593414e−17 2.50860756203605e−11   74.3318409985077

indicates data missing or illegible when filed

TABLE 3 Normal versus Mild Preeclampsia chr start end strand pvalue qvalue meth.diff chr2 85868078 85868078 *  3.5194993513349e−08 0.0024216712743042 −67.3618007282357 chr17 78802280 78802280 * 6.13863318081308e−08 0.00347585652497927 −60.3174603174603 chr4 27977807 27977807 * 6.00742796105181e−06 0.0750690131438125 −58.6326950466846 chr11 68373945 68373945 * 5.63363012523847e−06 0.0718853055408169 −54.2424242424242 chr4 169154580 169154580 * 2.24673501570411e−06 0.00172010475820479 −53.5947712418301 chr7 151724728 151724728 * 1.73522362030776e−06 0.0338074675249276 −52.9741560232656 chr5 12865273 12865273 * 7.21504868351372e−07 0.0195713318720594 −52.8753292361721 chr1 172748286 172748286 * 6.19845329082826e−11 1.34773564953995e−05 −52.6436781609195 chr22 45185017 45185017 * 7.60715104015446e−06 0.0842174212589503 −52.5516447131649 chr9 580319 580319 *  4.7056910445104e−08 0.00291084454940201 −52.3475609756098 chr13 112760954 112760954 * 4.16617535963161e−09 0.000435507455756272 −51.0442773600668 chr4 128742393 128742393 * 6.14079857577175e−08 0.00347585652497927 −50.3827227993439 chr6 167179087 167179087 * 2.08281818520008e−08 0.00161738984826254 −49.4380360839988 chr19 55209873 55209873 * 1.79574052904895e−06 0.0348615702805375 −46.0450636430594 chr5 102752373 102752373 * 4.07786605470401e−07 0.0129796760687627 −47.779013718947 chr5 112628386 112628386 * 6.91364726458152e−08 0.00383479837203912 −45.9870782483848 chr1 205850058 205850058 * 1.99409658808528e−10 3.737744217138023e−05 −44.9964763918252 chr1 205849819 205849819 * 4.06367622655203e−07 0.0129796760687527 −44.7817836812144 chr1 3427237 3427237 * 4.94921003694122e−16 6.72569680983604e−10 −44.6690873081367 chr7 1005317 1005317 *

0.00553706552855676 −44.6091115815886 chr1 205850082 205850082 *  1.0874671424508e−08 0.000985204269798062 −44.0997800014194 chr1 205850052 205850052 * 5.92294789074457e−13 2.92689146798765e−07 −44.0811772681053 chr1 81426518 81426518 *  7.0075613123647e−07 0.0192381412046252 −44.055171984823 chr1 205850104 205850104 * 8.32717753369843e−10 0.000115648611832455 −43.7239521965999 chr22 31248629 31248629 * 1.30305810600284e−10 2.52968906182125 −43.672514619883 chr1 205850063 205850063 * 8.95932248908814e−10 0.000118782566448294 −42.9183300066534 chr15 77908672 77908672 * 8.25765425136711e−06 0.0887089768390727 −42.8461538461538 chr9 28387669 28387669 * 6.78713472366506e−06 0.0805531232868902 −42.7870166299339 chr19 38499706 38499706 *  4.0105344158003e−08 0.00269140230907561 −42.5654786210115 chr18 31045897 31045897 * 5.76796584170878e−06 0.0730847523432371 −42.5531914893617 chr3 134916416 134916416 * 3.86432738108714e−06 0.0569257731851756 −42.3262032085562 chr5 64638081 64638081 * 1.50636275359321e−09 0.000186096533876145 −42.0508274231678 chr8 141170848 141170848 * 2.44040133950921e−08 0.00921213204921482 −41.7976760082023 chr10 132731408 132731408 * 1.81067249455354e−08 0.00757799771517082 −41.6142557651992 chr22 19586819 19586819 * 2.19977848674928e−08 9.19807596799142e−07 −41.2582675910431 chr2 3322275 3322275 * 7.75243231045903e−06 0.085019457373052 −40.9548921744044 chr1 205850124 205850124 * 1.15667512203153e−08 0.000998003897585052 −40.8009197164208 chr15 42015868 42015868 *  1.7744955377753e−07 0.0075357474276211 −40.5063291139241 chr2 1767572 1767572 * 5.10151486155981e−06 0.0678011781053311 −40.4225598044351 chr1 205849960 205849960 * 2.10584671188882e−06 0.0388030739134601 −40.3949967083608 chr1 205849963 205849963 * 5.83181082753211e−10 8.56767720789663e−05 −40.342413115874 chr1 205879990 205879990 * 4.71467738890653e−07 0.0143172741726291 −40.1776700989299 chr8 30580122 30580122 *  1.7559577996275e−07 0.00751573996245069 −39.9242424242424 chr3 184636251 184636251 * 1.56812206485296e−11 5.14503005743297e−06 −39.5348837209302 chr20 29324391 29324391 * 3.20372220684376e−06 0.0518442908913507 −39.4916911045943 chr9 138148159 138148159 * 6.72911273393646e−06 0.0800392473571132 −39.4342290893015 chr6 22061410 22061410 * 8.05457544212907e−08 0.00420988959320805 −39.3617021276596 chr1 3019823 3019823 * 7.37248524298958e−06 0.0840150180277284 −39.0967601178139 chr19 38499705 38499705 * 3.30451465849165e−08 0.00230289677733946 −38.4651686761661 chr1 205849983 205849983 *  2.3854596580419e−07 0.00913087271952168 −38.3088482673545 chr1 205849966 205849966 * 6.20884365998158e−09 0.000624997609944309 −38.2948958791319 chr1 205850010 205850010 * 3.82525017154793e−07 0.0126019366218156 −38.1735911147676 chr10 129046340 129046340 * 3.81908059518629e−06 0.0565902170382831 −37.5781042676896 chr16 66566575 66566575 * 1.94541872919422e−09 0.000224996947814191 −37.5532338613315 chr22 20667703 20667703 * 4.66511945897494e−06 0.0646024380866661 −37.4254049445865 chr11 3513408 3513408 * 3.45106486370145e−06 0.0539057720069594 −36.8334917115405 chr1 102235052 102235052 * 1.14434594785233e−06 0.0257160290067815 −36.8228849665246 chr1 163457381 163457381 *  2.3776888675431e−08 0.00177049031866852 −36.6312508853945 chr12 33075435 33075435 * 1.67889766512072e−06 0.0330656096982751 −36.4961581183368 chrX 26557760 26557760 * 4.33954022125379e−06 0.061750680383889 −35.8247524361857 chr1 3037601 3037601 * 1.49094094034888e−07 0.00668118895524527 −

chr6 168365090 168365090 * 2.21567226328442e−09 0.000250914451693387 −35.6463161861692 chr11 3509333 3509333 * 4.51713330314854e−06 0.0630098770047527 −35.2490421455939 chr1 205849961 205849961 * 7.78431846852264e−08 0.0041787742269341 −35.0458960876927 chr10 16779672 16779672 * 6.17403446716301e−06 0.0759290860445895 −34.7663247028171 chr1 163457380 163457380 * 3.82072721438564e−06 0.0565902170382831 −34.7237642083003 chr3 184636250 184636250 * 8.17991192563509e−14 4.94046139540635e−08 −34.7050728238847 chr21 44647264 44647264 * 2.01864170955556e−06 0.037887088288121 −34.4262295081967 chr7 6827387 6827387 * 2.22558665429373e−07 0.00896651164504118 −34.3949044585987 chr16 89227775 89227775 * 3.39275313069829e−06 0.0533012675267715 −34.3201288586803 chr1 3427238 3427238 *  1.9052592842424e−07 0.00778688637634848 −33.7758534316181 chr1 205850256 205850256 * 4.90916426283835e−08 0.00293242939139185 −33.3575757575758 chr20 63035180 63035180 * 1.77780270352192e−09 0.000210081167672482 −33.1344118606928 chr10 15534759 15534759 * 5.38024743424222e−07 0.0156394701158399 −33.0625272015088 chr1 3021702 3021702 * 1.37827391839617e−06 0.0292655684785 −32.5902915272752 chr12 132397001 132397001 * 2.38615965648831e−10 4.28858148243318e−05 −32.1428571428571 chr8 130356676 130356676 * 1.38664713585591e−06 0.0293287955185215 −31.9586312563841 chr11 3514122 3514122 * 7.54594048629173e−07 0.0199116636412291 −31.8506493506494 chr14 102870468 102870468 * 1.43061391863649e−07 0.00668118895524527 −31.8193660442901 chr1 3021701 3021701 * 2.34477902190371e−06 0.0420649796789997 −31.7332626119755 chr8 141170849 141170849 *  1.6090718815087e−11 5.14503005743297e−06 −31.6405612661119 chr1 3064014 3064014 * 1.29773472846529e−07 0.00624264840344132 −30.8998133003274 chr1 3037602 3037602 *  7.1015688340521e−06 0.0828380326767182 −30.7593769215003 chr11 44964070 44964070 * 6.70420308835351e−06 0.0799178351719117 −30.2449810198411 chr1 3037639 3037639 * 2.02816676163321e−07 0.00804719427047447 −29.9275760909424 chr3 56718947 56718947 * 4.37372185793998e−09 0.000448576670917393 −29.9081920903955 chr10 48143 48143 *  4.7167014617407e−08 0.00291084454940201 −29.5219159182558 chr2 27442609 27442609 * 1.00949626362308e−05 0.0999525235500815 −29.4665718349929 chr11 12263379 12263379 * 1.06493930045846e−06 0.0247743913021197 −29.2680210282456 chrX 44310321 44310321 * 9.18197996635613e−07 0.0225842389002894 −29.1681079211346 chr4 3746827 3746827 * 6.96526254368032e−06 0.0814227807271212 −29.1328828828829 chr3 164004174 164004174 * 2.03729755505485e−06 0.037887088288121 −29.0270070292268 chr21 46296433 46296433 * 7.47949420766429e−06 0.0842174212589503 −28.9809863339275 chr12 124373681 124373681 * 1.60188462882896e−08 0.00132565058057724 −28.8348985980672 chrX 4430373 4430373 * 6.10455334794975e−07 0.0173732839635593 −28.8343377053054 chrX 44310310 44310310 * 9.56085149428863e−07 0.0230980568757846 −28.7814078522341 chr22 49056296 49056296 * 1.59042499855762e−06 0.0320192253929983 −28.3665407076605 chr11 3514056 3514056 * 8.16779083433761e−06 0.0880517957435133 −27.7608401084011 chr4 80197352 80197352 *  4.3595024220583e−06 0.06187276741191 −27.250936329588 chrX 44210362 44210362 * 3.61680160045793e−12 1.40429400735197e−06 −27.1900464797228 chr2 170710457 170710457 * 5.08393634459985e−06 0.0677331593128208 −27.1844660194175 chrX 44310376 44310376 * 2.19768070891085e−06 0.0399534971832748 −27.0766928544706 chr3 9946410 9946410 * 4.22291676057923e−07 0.0132686827348675 −26.4705882352941 chr10 129511181 129511181 * 3.88951239902586e−06 0.0570144381016749 −26.1064578722076 chr17 82205372 82205372 * 6.91865435316407e−09 0.000683786208143929 −26.0267261791362 chr8 133062901 133062901 * 1.58663673825078e−06 0.0320192253929983 −26 chr2 54976913 54976913 * 3.92736270705501e−07 0.0128544689445323 −25.6405202995664 chr17 82205371 82205371 * 1.35259720519151e−08 0.00114881448757073 −25.5639097744361 chr8 133062902 133062902 * 4.72055907788727e−06 0.0647977078309181 −25.2553693033002 chrX 44310363 44310363 *  3.0568152516151e−08 0.00215794245032894 −25.074725011956 chr3 9946415 9946415 * 1.91077871827886e−06 0.036444074654644 −24.9589490968801 chr7 128271403 128271403 * 4.15985869640936e−06 0.0599789170971243 −24.9576181394363 chr10 133215684 133215684 * 5.97344697921834e−06 0.075013789176032 −24.7714069209396 chr6 33521699 33521699 * 5.06317248260392e−06 0.0676222637308449 −24.5228245828246 chr16 72877874 72877874 *  2.8222481563497e−07 0.0104361245917236 −24.5461273890897 chrX 44310393 44310393 * 2.04771099783168e−08 0.00161317307268514 −24.5285580665601 chr1 218217201 218217201 * 3.15086638737154e−06 0.0512796334908626 −24.3989397446427 chr11 44834245 44834245 * 9.04979987108319e−06 0.0935221803126185 −24.0559696932665 chr5 13810207 13810207 *  6.8148686533429e−06 0.0805545754455256 −23.8458566737493 chrX 44310356 44310356 * 9.19463608933693e−06 0.0941242256005261 −23.8120693751485 chr4 69658238 69658238 *  4.6858747658948e−06 0.0646481122327589 −23.5000252308624 chr5 142375677 142375677 * 7.43846107282538e−05 0.0842174212589503 −23.4626072248399 chr2 217151607 217151607 * 8.09371351800198e−05 0.0874663966294308 −24.4375 chr8 95073358 95073358 * 8.29263379253513e−06 0.0889090397652541 −23.2689832689833 chr9 138123005 138123005 * 2.56199816863536e−11 7.17415789972436e−06 −22.6535415384916 chr13 29503282 29503282 * 4.65835941888654e−08 0.00291084454940201 −22.0588235294118 chr9 124838915 124838915 * 9.65643759778869e−06 0.0972040970864485 −21.6374269005848 chr6 57576937 57576937 * 7.58831019639142e−06 0.0842174212589503 −20.9876543209877 chr4 8975138 8975138 * 2.36994988415395e−08 0.00177049031866852 −20.8104112471036 chr22 15263712 15263712 * 4.99613504645456e−08 0.00295194141152244 −20.662385877183 chr20 58852896 58852896 * 4.85861078536543e−07 0.0145111594413236 −20.5533887703289 chrX 44310418 44310418 * 3.18657761623654e−07 0.0111421082907189 −20.5533887703289 chr6 165876732 165876732 * 5.00828194858739e−06 0.0672129485204955 −20.0587043265715 chr7 157994682 157994682 * 1.00838152158822e−05 0.0999525235500815 −19.9902959728287 chr19 38976130 38976130 * 8.89952692923179e−07 0.0224868678698749 −19.8201155462185 chr12 1519454 1519454 * 9.51916762309728e−06 0.0963911891443587 −19.2817087507933 chr19 50133862 50133862 * 5.29888080543803e−06 0.0687971092962942 −19.1765842742346 chr14 104851983 104851983 *   5.256835520871E−06 0.0686900397879824 −19.0168490533264 chrX 44310597 44310597 * 1.50262289414418e−06 0.0307064603751643 −19 chr10 125023522 125023522 * 5.25685147171326e−06 0.0686900397879824 −18.3318813333945 chr2 144522921 144522921 * 7.20103464503185e−06 0.0880173843820925 −17.1608925425719 chr2 218608011 218608011 * 3.10787704929277e−06 0.0508212671698537 −16.9748090085485 chrX 44310441 44310441 * 2.25038216983616e−06 0.0407752296295299 −16.7364507924085 chr13 25096005 25096005 * 2.44576073169025e−10 4.28858148243318e−05 −16.0790011839414 chr12 131840590 131840590 * 6.81284556136893e−07 0.0188247134508835 −15.8986928104575 chr5 502298 502298 * 7.32287737422104e−06 0.083759733246032 −15.617780091764 chr20 29324198 29324198 * 8.65861964295904e−06 0.09210626131328 −15.4665314401623 chr2 110212166 110212166 * 8.84368595140172e−07 0.0223591991436254 −15.2765957446809 chr13 25096001 25096001 * 6.32943826954945e−06 0.0764564355116043 −14.9168062534897 chr13 25096012 25096012 * 1.31589577987864e−07 0.00627448426927547 −14.4155844155844 chr17 82861233 82861233 * 5.45495194711744e−06 0.0704097356507249 −14.2081447963801 chr2 869008 869008 * 2.60457766326837e−06 0.0449326131402911 −

chr17 8226706 8226706 * 6.26829788264717e−06 0.0762540805535126 −

chr20 62799630 62799630 * 3.24363770746426e−06 0.0521339513270659 −13.3646552700102 chr3 158104239 158104239 * 2.42084069740981e−06 0.0427849084025574 −13.2600919775166 chr13 25096033 25096033 *  1.7308866422865e−06 0.0338074575249276 −12.9296066252588 chr2 63055926 63055926 * 6.04540592624519e−06 0.0752135681501139 −12.1212121212121 chr10 126918347 126918347 * 1.31421353359581e−06 0.0282362361911788 −11.9043006586594 chr16 57591317 57591317 * 1.63522053441961e−06 0.0325593013379642 −11.7856526243094 chr10 127194852 127194852 * 1.14487449618787e−06 0.0257160290067815 −11.3061760492386 chr5 141189533 141189533 * 4.23822707387975e−06 0.0606264329849779 −10.4203389830508 chr17 79079547 79079547 *  8.0514906785991e−06 0.0871834336493453  −9.15492957746479 chr3 12554272 12554272 * 5.15835485708764e−06 0.0680574058614823  −9.03543719857412 chr17 18698467 18698467 * 3.37397705540429e−06 0.0531599326213426    4.81481481481481 chr12 46831525 46831525 * 1.46787107548532e−06 0.0301094921791514    5.1640350877193 chr22 20268607 20268607 * 7.75085541403057e−06 0.085019457373052    6.03015075376884 chr2 227324859 227324859 * 8.76241329583653e−06 0.0923419046394581    6.82926829268293 chr3 49686913 49686913 * 6.63026139255662e−06 0.0792101153645342    7.18562874251497 chr19 48837317 48837317 * 5.95262870231243e−06 0.075013789176032    7.8740157480315 chr19 19170227 19170227 * 6.26547971610748e−06 0.0762540805536126    8.37988826815642 chr10 122114233 122114233 * 2.42992025342566e−06 0.00178493207238441    8.55263157894737 chr8 98748402 98748402 * 5.42434191178619e−06 0.0702035636457087    8.60760921679781 chr7 153749251 153749251 * 1.00568402479447e−05 0.0999525235500815    8.66547085201794 chr15 59206996 59206996 * 5.61937943310756e−06 0.0718721802791079    9.34258949481432 chr14 56805622 56805622 * 9.01381221769034e−07 0.0224868678698749    9.46401394606871 chr7 150737454 150737454 * 2.78826334492208e−09 0.000309313661132254    9.51231608530336 chrX 9890831 9890831 * 3.55431923686348e−06 0.0548877161802339    9.69402024948258 chr14 76131559 76131559 * 6.50249846229761e−06 0.0780472704441361   10.3896103896104 chr15 78810637 78810637 * 1.67390804690697e−06 0.0330656096982751   10.4477611940298 chr8 98949296 98949296 * 3.31967115855298e−06 0.0525531981597828   10.6397371472748 chr21 29019121 29019121 * 4.11629724581241e−06 0.0595086743219646   10.8333333333333 chr14 65279792 65279792 * 3.78902349229063e−06 0.0565902170382831   10.9355742296919 chr4 182895022 182895022 * 8.07231904374769e−07 0.0208949057036708   10.9874881743927 chr14 103158052 103158052 * 9.25268817307087e−06 0.0942489382327313   11.1216181931407 chr22_ 147890 147890 * 2.99931586206499e−06 0.0495550248656549   11.1397962767826 Kl27073 chr16 87644983 87644983 * 2.67899555916917e−06 0.0452938349135597   11.1776967170401 chr6 63275283 63275283 * 6.09301763917949e−06 0.0756170476360577   11.1965358784023 chr17 1907284 1907284 * 5.13254724852875e−06 0.0679910831660104   11.2093171817718 chr2 233446634 233446634 * 8.76571078818957e−06 0.0923419046394581   11.3687985654513 chr2 11640232 11640232 * 3.03783376535419e−06 0.0498881504472339   11.4973614775726 chr20 2656783 2656783 * 5.57071321764643e−06 0.0714177784668368   11.5076228119706 chr5 52787012 52787012 *  4.4119824134481e−06 0.0624545294048918   11.5702479338843 chr8 7355190 7355190 * 5.35951186567492e−07 0.0156394701158399   11.5772188564551 chr4 182271973 182271973 * 7.59426845622977e−06 0.0842174212589503   11.747891833895 chr8 142317199 142317199 * 6.71634686800095e−08 0.00376376738969701   11.8573374191352 chr17 1934527 1934527 * 8.04223173595121e−06 0.0871834336493453   12.0394442864687 chr14 70594590 70594590 * 6.16017019652066e−06 0.0759290860445895   12.0613955513625 chr16 1656909 1656909 * 6.96339601592992e−06 0.0814227807271212   12.2328666175387 chr17 34581157 34581157 * 4.03611186194179e−06 0.0586012874071419   12.280701754386 chr2 210000856 210000856 * 3.33071513702361e−07 0.0113868017253078   12.4065640135907 chr9 60546715 60546715 * 5.03211836839035e−06 0.0673730497335436   12.4318026298755 chr2 23381331 23381331 * 5.23000265546139e−06 0.0686693537548019   12.6325799136859 chr8 7355178 7355178 * 1.96532863559155e−07 0.00785520757265003   12.6415175870711 chr5 140113195 140113195 * 1.11816262026038e−06 0.0254312941380053   12.6435781413157 chr10 80315534 80315534 * 1.82075329657621e−06 0.0350964660636285   12.9637817324343 chr11 124953770 124953770 * 4.92000761841497e−06 0.066860123729955   13 chrX 9890823 9890823 *  1.1633577890612e−09 0.000147063954011078   13.0329457364341 chr14 84399673 84399673 * 1.91946182418947e−06 0.0364816806920338   13.3036542141972 chr2 123826928 123826928 * 5.97866405297445e−06 0.075013789476032   13.3259328836137 chr5 28077923 28077923 * 5.30301458359882e−06 0.0987971092962942   13.3566928121384 chr12 125025025 125025025 * 9.18126714338813e−07 0.0225842389002894   13.4502923976608 chr8 7355019 7355019 * 2.65882870076947e−06 0.0452938349135597   13.5951885951886 chrUn_ 1964 1964 *  2.3152978161566e−06 0.0418120797258048   13.6407727750384 JTFH01 chr6 3761436 3761436 * 4.42004631218611e−07 0.0138082602428604   13.6571916744331 chr5 1974697 1974697 * 7.57778168074868e−06 0.0842174212589503   13.8895534290271 chr16 67909149 67909149 * 7.95236576760233e−06 0.0866277817184845   13.9528377298161 chr3 184039229 184039229 * 6.83171509345077e−06 0.0805545754455256   14.069581444948 chr20 3697355 3697355 * 3.94920081985913e−07 0.0128544689445323   14.2098487286772 chr8 7355171 7355171 * 4.83087840534921e−08 0.00291772924810652   14.2435063802068 chr16 88777480 88777480 * 2.68307969429307e−06 0.0452938349135597   14.2811780582678 chr1 11858760 11858760 * 2.64990567816795e−06 0.0452938349135597   14.2857142857143 chr12 53945167 53945167 * 5.67778816255337e−06 0.0721102179241014   14.5084201486128 chr3 166976648 166976648 * 4.70085258764814e−06 0.0646905627184063   14.5181525915471 chr1 64723293 64723293 * 7.55515993400678e−06 0.0842174212589503   14.8208556149733 chr3 140562546 140562546 * 3.80868590924468e−06 0.0565902170382831   14.94708994709 chr7 56814596 56814596 * 2.41315458749695e−06 0.0427849084025574   14.9909164943933 chr8 7355160 7355160 * 7.20572670030231e−06 0.0830173843820925   15.0429099349479 chr20 20365677 20365677 * 4.94057119021423e−06 0.0669721407005376   15.0537634408602 chr15 99997598 99997598 * 6.92179009759833e−06 0.0813341226839447   15.0636980592928 chr13 31609131 31609131 * 1.24378725749963e−06 0.0270437857433912   15.1135005973716 chr5 975987 975987 * 2.44683883092541e−06 0.0427849084025574   15.2380952380952 chr2 239939101 239939101 * 2.53035213850558e−06 0.0439438414227893   15.4233608336952 chr13 113453322 113453322 * 4.67067949625201e−06 0.0646024380866661   15.4558800473197 chr16 86734074 86734074 * 2.61208537226382e−06 0.0449326131402911   15.7174969073409 chr12 130437316 130437316 * 4.68840585590966e−07 0.0143172741726291   15.8701092595984 chr1 18938585 18938585 * 9.52484532910695e−06 0.0963911891443587   15.9091952439976 chr13 31339964 31339964 * 3.46548133975747e−06 0.0539758549115726   15.9462915601023 chr19 1625320 1625320 * 9.25421409217928e−06 0.0942489382327313   16.0058617601563 chr1 164576420 164576420 * 4.50915568038441e−06 0.0630098770047527   16.0969568294409 chr7 101245417 101245417 * 8.43570160713426e−06 0.0900875588758716   16.1290322580645 chr1 185649037 185649037 * 5.65890062099215e−06 0.0720386530776885   16.1350112184177 chr7 157441764 157441764 * 8.52564447818141e−08 0.00433116603623758   16.2108033875029 chr5 27120204 27120204 *  7.7351057034027e−06 0.085019457373052   16.2820512820513 chr17 9116020 9116020 * 2.04542042936614e−06 0.037887088288121   16.6060606060606 chr19 12495880 12495880 *  2.3693398810559e−06 0.0423657767191011   16.6666666666667 chr1 145156891 145156891 * 6.75413310027288e−07 0.0188247134508835   16.8543543543544 chr6 27235585 27235585 * 8.69555613449339e−06 0.0923185116238665   16.8707899571341 chr7 81071315 81071315 * 6.50420961415866e−06 0.0780472704441361   16.9085218380993 chr4 153452347 153452347 * 4.82430577592971e−06 0.0658890349368055   17.0630725863284 chr5 95879781 95879781 * 1.52151529640109e−07 0.00672407584453604   17.1367177007789 chr5 1876520 1876520 * 4.00331628585466e−06 0.0583408112075746   17.1789502127947 chr12 42611025 42611025 * 1.18993988524624e−07 0.00588022208162228   17.2442280099135 chrX 69128579 69128579 * 3.26090515374672e−06 0.0521339513270659   17.2504206942433 chr3 161161933 161161933 * 1.58290684073595e−06 0.0320192253929983   17.44.5682257441 chr14 83092425 83092425 * 3.84856319481191e−06 0.056847608020966   17.4449675531238 chr17 62138102 62138102 * 1.66242940637828e−07 0.00722927243522119   17.5211736237145 chr1 187703924 187703924 * 2.98553060428667e−06 0.0494776513261761   17.68144323779 chr9 93259190 93259190 * 3.50462162235937e−06 0.0544295174133111   17.7272727272727 chr8 7355161 7355161 * 3.98595037543776e−06 0.0582438855271366   17.7704396746774 chr8 7355162 7355162 * 7.45369592985245e−07 0.0197641983119014   17.8508173616816 chr2 83181431 83181431 * 9.45177758089121e−07 0.0229364851910676   17.9188712522046 chr3 75444754 75444754 * 6.93435129599055e−06 0.0814115906730275   17.972972972973 chr19 38312833 38312833 * 7.46242227283568e−06 0.0842174212589503   17.9806882102434 chr11 8597404 8597404 * 8.01557190189945e−06 0.0871416746789511   18.111611489776 chr20 31996615 31996615 * 2.02108559825635e−06 0.037887088288121   18.1457431457432 chr7 124033873 124033873 * 4.49103188953225e−06 0.0630098770047527   18.4265876335476 chr16 1129530 1129530 *  8.9506136900616e−06 0.0928502159932842   18.4817226790892 chr1 51847682 51847682 * 3.32580055148821e−06 0.0525531981597828   18.5538096436662 chr16 1552139 1552139 * 1.00722943110529e−05 0.0999525235500815   18.6243894627272 chrX 75523136 75523136 * 9.00356341640296e−06 0.0932215922404464   18.7106918238994 chr3 196170941 196170941 * 8.91496448738382e−06 0.0928502159932842   18.7320231910248 chrX 34656908 34656908 * 4.63214185737443e−06 0.0643971260169504   18.7730911330049 chr6 1410032 1410032 * 6.39716253879425e−10 9.0159288953031e−05   18.7904998227579 chr14 102210659 102210659 * 9.69287010965831e−06 0.0972415423588671   18.8873156529901 chr16 50283745 50283745 * 5.46617803547497e−06 0.0195713318720594   18.9184300958582 chr1 1049815 1049815 * 9.69593584253245e−06 0.0105974382457425   18.9326334208224 chr17 3911398 3911398 * 7.27294592850256e−07 0.0195713318720594   18.9765227739911 chr14 83092419 83092419 * 2.88536804578315e−07 0.0105974382457425   19.1532609326906 chr1 187703949 187703949 *  1.2928834327608e−06 0.0278881831302854   19.1663186942866 chr19 1999311 1999311 * 8.24411713869786e−06 0.0887089768390727   19.1836425021813 chr12 15590179 15590179 * 1.19964810544574e−06 0.0264008744159159   19.2240202768571 chr3 127776110 127776110 * 2.19620718085617e−06 0.0399534971832748   19.3798449612403 chr7 158992589 158992589 * 1.44855505795679e−06 0.0300262709961256   19.6808510638298 chr11 75428589 75428589 * 2.91638004118714e−07 0.0106394515449691   19.7830547341742 chr5 26485521 26485521 * 1.01370811184431e−06 0.0238540614629973   19.8306595365419 chr20 56466014 56466014 * 3.63026718570573e−06 0.0556939703438536   19.886831757202 chr17_ 241564 241564 * 5.76537124182039e−08 0.00333396330815266   20.00777000777 GL3835 chr19 41916208 41916208 *  7.8845601894565e−06 0.0860626034115315   20.0155561317086 chr5 73303234 73303234 * 1.09585329697467e−06 0.0250286170567613   20.0578034682081 chr22 40010046 40010046 * 3.72215581649801e−07 0.0124089588893108   20.0913764653141 chr15 79559260 79559260 * 6.82233762843546e−07 0.0188247134508835   20.2613037827485 chr21 8988098 8988098 * 5.88148889293313e−07 0.0170055555348376   20.286252354049 chr12 132283239 132283239 * 9.91702257153166e−06 0.0992756785483333   20.3076923076923 chr10 1210729 1210729 * 9.37593144196478e−06 0.095262512818719   20.3806303849929 chr2 145861437 145861437 * 3.27733497087068e−07 0.0112752229265504   20.4418769636161 chrX 64204264 64204264 *  7.3346745146342e−06 0.083759733446032   20.4679802955665 chr17 17700424 17700424 * 2.32648100362032e−06 0.0418749170562029   20.4968944099379 chr12 131003900 131003900 * 8.36629413960065e−08 0.00429030982645222   20.5016036133 chr12 125048987 125048987 * 1.94629478365732e−07 0.00783675450469636   20.5169003657418 chr8 144588938 144588938 * 6.28197362878975e−06 0.0762540805536126   20.5499025763152 chr2 145881433 145881433 * 9.15449340302414e−07 0.0225842389002894   20.6399509302807 chr2 239283565 239283565 * 2.30772904442503e−07 0.00902467161059745   20.6826855123675 chr22_ 143529 143529 * 5.53148029093805e−06 0.0710824509385189   20.7572914889988 KI27073 chrX 6002109 6002109 * 1.23406567416543e−06 0.026940169100084   20.8955223880597 chr4 74099957 74099957 * 9.18777386103428e−06 0.0941242256005261   20.9017923823749 chr1 193918912 193918912 * 7.13310101183261e−06 0.0828502671356697   21.0335917312661 chr11 75428550 75428550 * 2.44787706648499e−06 0.0427849084025574   21.0592114301415 chr12 7639875 7639875 * 3.67124877806243e−06 0.0558327283785535   21.1223735613979 chr8 650032 650032 *  8.9490725133759e−06 0.0928502159932842   21.1396574440053 chr22 5006106 5006106 * 6.48255006428493e−06 0.0780472704441361   21.25900573554 chr19 1999288 1999288 * 8.93878392187845e−06 0.0928502159932842   21.3197535090722 chr12 7639872 7639872 * 2.47253872752689e−06 0.0430774414393711   21.3312224669604 chr7 2508541 2508541 * 1.28858699058934e−06 0.0278881831302854   21.3495440729483 chr14_ 23560 23560 * 5.33381728958435e−06 0.0156394701158399   21.3945352554436 KI27072 chr9 41235357 41235357 * 9.50421630768037e−06 0.0963911891443587   21.4494005832163 chr6 163834777 163834777 * 8.55894646117047e−06 0.0912245071392597   21.5409658766034 chr12 916398 916398 * 3.40478209945203e−06 0.053336098016097   21.5830721003135 chr7 94394113 94394113 * 4.97194331814183e−06 0.0672194785204955   21.588628077506 chr12 7639881 7639881 * 1.43892972296886e−06 0.0300026320652593   21.6060425362751 chr1 91949051 91949051 * 6.61331824427898e−06 0.079181725475721   21.6992586912065 chr16 11665637 11665637 * 4.79991926795308e−07 0.014495153532688   21.7257634761222 chr16 73063946 73063946 * 7.07115324323374e−08 0.00388254454036219   21.9392372333549 chr18 5892049 5892049 * 1.95792237866169e−06 0.0370830089196772   21.9876292264988 chr2 231393387 231393387 * 1.00096505921184e−05 0.0999525235500815   22.0759944587374 chr10 49679809 49679809 * 2.04916612734591e−06 0.037887088288121   22.1973586064675 chr2 52859724 52859724 * 3.02898732644796e−06 0.0498881504472339   22.3829449534666 chr17 41825406 41825406 * 1.09006398198915e−06 0.0250014407241902   22.4517374517375 chr14 83092404 83092404 * 1.13685810194342e−08 0.000997750739317906   22.5073336421183 chr13 113472325 113472325 * 9.01828736005716e−07 0.0224868678698749   22.5661375661376 chr11 62433748 62433748 * 1.62887198291174e−06 0.0325593013379642   22.6699203601763 chr20 3752446 3752446 * 2.75225148985962e−06 0.0463176999691646   22.7400732188116 chr12 117155454 117155454 * 1.16700980412723e−06 0.0258309554975365   22.7479320061813 chr3 12882861 12882861 * 6.13496980156831e−06 0.0759290860445895   22.8274428274428 chr12 131003812 131003812 * 4.28713009541067e−06 0.0611650139297923   22.8840454271251 chrX 50469150 50469150 * 1.08239958475546e−06 0.0249308454255277   23.0175652350492 chr7 158996842 158996842 *  4.7281330174762e−08 0.00291084454940201   23.0547453254877 chr11 75428635 75428635 *  8.9348440257961e−06 0.0928502159932842   23.1064187414644 chr16 73063947 73063947 * 1.48923673268711e−07 0.00668118895524527   23.115319679953 chr5 85941 85941 * 3.25399127254312e−07 0.0112662172965075   23.1763285024155 chr1 1296199 1296199 * 1.00738486641878e−06 0.0238083324559683   23.2140226838531 chr1 1426345 1426345 * 6.04666757853652e−06 0.0752135681501139   23.4122807017544 chr2 131658207 131658207 * 5.00031003540981e−06 0.0672194785204955   23.5658042744657 chr15 40069353 40069353 * 3.20644880283443e−06 0.0518442908913507   23.6060161408657 chr17 62701570 62701570 * 7.28005900000395e−06 0.0834868258031381   23.8938053097345 chr7 1169891 1169891 * 1.73669031418151e−11 5.24458669771092e−06   24.1511961722488 chrX 149938294 149938294 * 1.16360309167929e−06 0.0258309554975365   24.1581137309293 chr16 737792 737792 * 3.63726664722779e−06 0.0556939703438536   24.164868233077 chr7 56814628 56814628 * 3.74384449730062e−07 0.0124089588893108   24.2185299915517 chr17 80187604 80187604 * 8.30191286681125e−07 0.0212864726941878   24.243450767841 chr1 237862175 237862175 * 7.51861077966319e−06 0.0842174212589503   24.2988352042791 chr16 22949123 22949123 * 7.22697368386513e−06 0.0830532677577977   24.3350874697767 chrX 8003769 8003769 *  2.4290362673546e−06 0.0427849084025574   24.5106148331955 chr2 238143670 238143670 * 7.32931884194799e−07 0.0196230345936683   24.5531692978719 chr7 138008755 138008755 * 3.30418247278432e−06 0.0525531981597828   24.7375666838754 chr11 75428433 75428433 * 2.83303008784382e−06 0.0475299731256765   24.782824522214 chr17 4656719 4656719 * 2.89013508024201e−08 0.00206712120074571   24.8588709677419 chr3 80770400 80770400 * 7.46791709938012e−06 0.0842174212589503   24.9122807017544 chr2 239939031 239939031 * 2.04549100511889e−06 0.037887088288121   24.9192492361414 chr11 75428554 75428554 *  1.3935170521631e−06 0.0293598595376156   24.9411589822362 chr22_ 152007 152007 * 7.89282518458512e−08 0.0041787742269341   25.0632911392405 KI27073 chr10 21500180 21500180 * 2.95144882844735e−06 0.0491844195334005   25.120504263997 chr9 28147464 28147464 * 6.14879791054759e−06 0.0759290860445895   25.1282051282051 chr7 903302 903302 * 7.64296030512978e−06 0.0844418798976758   25.1683501683502 chr11 75428448 75428448 * 1.34074138481072e−06 0.028692784095583   25.3016935402089 chr17 41825404 41825404 * 1.49951976027687e−07 0.00668118895524527   25.4486613709915 chr18 5892057 5892057 * 8.99086690192464e−08 0.00452521486508727   25.4501624567655 chr9 65675170 65675170 * 1.80644306745188e−06 0.0349445418453209   25.7521186440678 chr16 963991 963991 * 4.40906992247558e−11 9.98612818698948e−06   25.7884749825227 chrX 75523150 75523150 * 1.81232645612604e−07 0.00757799771517082   25.8129338691998 chr9 81508657 81508657 * 7.73228766081226e−07 0.0203048155686783   25.904241478012 chrUn_ 95 95 * 8.33036425580317e−06 0.0891377508508351   25.9320629660315 KN7075 chr21 8988154 8988154 * 3.58364466696833e−06 0.055183803132989   25.9720837487537 chr21 8988157 8988157 * 4.15234750866201e−08 0.00275258812518899   26.1049131087749 chr18 5892066 5892066 * 8.60242561973568e−07 0.0218508605236883   26.175873192879 chr17 49014937 49014937 * 4.46218606294549e−06 0.0628379144666358   26.1784602169011 chr6 17988707 17988707 *  1.8538964252743e−08 0.00150408387868641   26.3025210084034 chr16 1458835 1458835 * 5.00408209342069e−06 0.0672194785204955   26.3317510548523 chr6 134904851 134904851 * 2.94324128097444e−07 0.0106658529538984   26.3739008792966 chrX 153794709 153794709 * 7.36435888848838e−07 0.0196230345936683   26.5266731006024 chr7 157994921 157994921 * 3.73604051238001e−06 0.0562556007760165   27.2452830188679 chr22_ 135426 135426 * 7.48647549334856e−06 0.0842174212589503   27.2573124205172 KI27073 chr9 133918282 133918282 * 8.45093060640189e−07 0.0215668304535604   27.3138626079802 chr19 39891862 39891862 * 1.06649164676383e−06 0.0247743913021197   27.363330529857 chr7 39753899 39753899 * 4.02968560026197e−07 0.0129796760687627   27.4255007327797 chr16 1459220 1459220 * 5.12889046802757e−07 0.0152219475968595   27.6666666666667 chrX 18354020 18354020 * 7.83764850142076e−07 0.0204825413197894   27.7777777777778 chr7 1169927 1169927 *  3.9798887156175e−08 0.00269140230907561   27.8989898989899 chr7 1169866 1169866 * 1.13802644918111e−08 0.000997750739317906   28.0286200628949 chr20 63273801 63273801 * 6.61704195900469e−07 0.0186366552526771   28.2304900181488 chr10 133367962 133367962 * 4.67884402276295e−07 0.0143172741726291   28.2661782661783 chr7 1169868 1169868 * 1.60957645255484e−08 0.00132565058057724   28.4219292158223 chr11 69649130 69649130 * 6.24597690994856e−06 0.0762540805536126   28.6509376890502 chr22_ 135525 135525 * 9.69120461941616e−07 0.0232066951977543   28.7601160482516 KI27073 chr7 157999053 157999053 * 1.94415110855558e−08 0.00155411265411141   28.8235294117647 chr2 171875082 171875082 *   4.438847915766E−06 0.0625716210142209   28.8461538461538 chr1 124207129 124207129 * 9.96140853470557e−07 0.0236453997887907   28.8875598086124 chr16 87833135 87833135 *  1.4445373427969e−07 0.00668118895524527   29.0027408812987 chr19 440707 440707 *  8.7413787366665e−06 0.0923419046394581   29.0465631929047 chr7 1169837 1169837 * 3.32174498286576e−06 0.0525531981597828   29.0989145124025 chr22_ 135472 135472 *  9.6300489607308e−06 0.0971183112420548   29.1907930228376 KI27073 chr15 25802207 25802207 * 3.67714271408674e−06 0.0558327283785535   29.2625893853906 chr22 49925463 49925463 * 1.16899912549602e−06 0.0258309554975365   29.3992557150452 chr7 1169926 1169926 * 2.60575011184293e−14 2.02346695835041e−08   29.5172629276676 chr13 27425174 27425174 * 7.58145392462663e−06 0.0842174212589503   29.6590909090909 chr14 100440442 100440442 * 9.29761562457193e−07 0.0226635593157138   29.7067669172932 chr14 104159421 104159421 * 7.26336992160453e−07 0.0195713318720594   29.7872340425532 chr22 42692830 42692830 * 3.71006794805588e−06 0.0561757406414359   29.812030075188 chrX 65667643 65667643 * 2.65560720339432e−06 0.0452938349135597   29.8179535467671 chr5 2564529 2564529 * 3.29438357772334e−10 5.42652260539864e−05   29.9133239266277 chr22_ 135492 135492 * 3.25156679174846e−06 0.0521339513270659   29.9465131471501 KI27073 chrX 75523159 75523159 * 3.66123615248956e−06 0.0558327283785535   30 chr12 130407617 130407617 *  4.8374970236187e−07 0.0145111594413236   30.2255639097744 chr7 157999054 157999054 * 1.09309148451844e−10 2.20066602635805e−05   30.2298640326809 chr7 1169850 1169850 * 9.29379730743095e−07 0.0226635593157138   30.2420207144367 chr22_ 135541 135541 * 1.86822179371111e−06 0.0357942136078405   30.4802442861109 KI27073 chr22 36564600 36564600 * 2.38359128753439e−07 0.00913087271952168   30.6029945601255 chr8 141275243 141275243 * 4.81264100959551e−06 0.0658890349368055   30.6732970242324 chr12 2106958 2106958 * 3.05697247806909e−07 0.0110046433874195   30.7692307692308 chr7 1169849 1169849 * 7.20857883869583e−06 0.0830173843820925   30.9466445816393 chr22_ 135506 135506 * 8.88064814109745e−06 0.0928502159932842   31.0674989420229 KI27073 chr22_ 135436 135436 *  8.2675165433281e−08 0.00428003347340884   31.0833390165723 KI27073 chrX 17737212 17737212 * 8.71467580601656e−06 0.092341146520027   31.1657414750198 chr13 23570923 23570923 * 9.94118301283143e−07 0.0236453997887907   31.3373312168068 chr12 132174708 132174708 * 1.66855094619531e−06 0.0330656096982751   31.4325314325314 chr17 80182491 80182491 * 1.71280817284708e−07 0.0073892366134521   31.541344735667 chr15 75723716 75723716 * 1.47612094080113e−07 0.00668118895524527   31.5940929070613 chr6 66289559 66289559 * 2.89477259315682e−06 0.0484163987624443   31.8881827209533 chr7 1169859 1169859 *  2.4020770526729e−07 0.00913087271952168   31.8884408602151 chr17 50199609 50199609 * 3.21417447274026e−06 0.0518442908913507   31.9202695115104 chr7 1169867 1169867 * 1.12262169300261e−09 0.000145293281206181   32.0375494071146 chr19 46468105 46468105 * 1.87012128624676e−06 0.0357942136078405   32.1556500450973 chr7 1170095 1170095 * 4.20782744107426e−06 0.0603591447467553   32.2055137844612 chr5 9725687 9725687 * 2.15438836789597e−06 0.0394301962158631   32.2835738068813 chr7 1169892 1169892 * 9.91961448225861e−15 8.98679708211414e−09   32.3045072595159 chr6 88047529 88047529 * 1.25181084625915e−07 0.00613023500091303   32.3684210526316 chr1 55038977 55038977 *  5.1879492291957e−06 0.0682821296207797   32.3809523809524 chr11 39271373 39271373 * 1.14115450774726e−06 0.0257160290067815   32.5301204819277 chr1 236762554 236762554 * 3.81746516099194e−06 0.0565602170382831   32.5496414782129 chr7 1169860 1169860 * 2.90356982453531e−11 7.14415879972436e−06   32.8602570848681 chr20 62228803 62228803 * 3.79264790329375e−06 0.0565902170382831   32.9224447868516 chr7 231461 231461 * 2.58712129926434e−06 0.0447866455203418   32.9752953813104 chr11 17534318 17534318 * 2.95878842413646e−06 0.0491844195334005   33.1380830533033 chr22 20192292 20192292 *  4.2084376317007e−06 0.0603591447467553   33.2174154958965 chr1 201115212 201115212 * 1.03413887580242e−06 0.0242299362649828   33.7993527508091 chr17 79949679 79949679 * 7.12309477177497e−06 0.0828502671356697   34.0579710144928 chr1 19068359 19068359 * 9.15802753255408e−09 0.0008437245102221   34.0660372927528 chr14 54366583 54366583 * 7.61179222371085e−14 4.94046139640635e−08   34.2294289168622 chr4 39530307 39530307 * 1.94501027008325e−07 0.00783675450469636   34.3064489526637 chr14 24316393 24316393 * 5.15260266122569e−07 0.0152219475968595   34.4149207971385 chr22 24551484 24551484 * 2.51821869600452e−08 0.0018251290285407   34.5132743362832 chr22_ 135547 135547 * 5.20520235903264e−08 0.00304239824171702   34.5410628019324 KI27073 chr11 8384488 8384488 * 1.07665556751145e−06 0.0249040695354639   34.7738057576716 chr4 8466568 8466568 8.87870673713955e−06 0.0928502159932842   35.0521581012537 chr7 1169916 1169916 * 1.90021721977424e−07 0.00778688637634848   35.1368363437329 chr15 25802206 25802206 * 7.91816851425781e−08 0.0041787742269341   35.2173913043478 chr22_ 135441 135441 * 2.13797389688936e−06 0.0392619693290198   35.5576441102757 KI27073 chr8 693717 693717 * 5.30148593625675e−06 0.0687971092962942   35.6578286499775 chr13 36840187 36840187 * 8.21287776817919e−07 0.0211579846138817   35.6624233851957 chr7 1169865 1169865 * 7.51432431054318e−12 2.72307791432124e−06   35.6695266557409 chr11 121964691 121964691 * 9.12389066304506e−06 0.0940664388454195   35.9448988377099 chr3 147245632 147245632 * 1.76882316043239e−09 0.000210081167672482   35.947585348783 chr7 155830996 155830996 * 3.11335275667481e−06 0.0508212671698537   35.958314772402 chr5 54821934 54821934 * 6.27202011073538e−22 3.40932838454125e−15   36.2454415728556 chr22_ 135295 135295 * 4.86220251851834e−06 0.0662401855461066   36.2914212445406 KI27073 chr2 239231857 239231857 * 2.04487229693304e−06 0.037887088288121   36.3297150610584 chr3 156099527 156099527 * 9.16963883562384e−06 0.0941242256005261   36.4396973742768 chr22 18080123 18080123 * 4.45835258666235e−07 0.0138483411848068   36.6120218579235 chr12 91924347 91924347 * 4.70477273752853e−08 0.00291084454940201   36.6836498378879 chr9 81508683 81508683 * 7.54923936734874e−06 0.0824174212589503   36.753463927377 chr5 54821933 54821933 * 2.08418167490149e−21 5.66457027985299e−15   36.8225524475524 chr3 164402171 164402171 * 7.75780061073801e−06 0.085019457378052   37.1922065503227 chr22_ 135487 135487 * 6.81715173636459e−06 0.0805545754455256   37.516148970801 KI27073 chr7 1169907 1169907 * 5.55429820086667e−10 8.3866415968105e−05   37.6223776223776 chr7 1169908 1169908 * 3.60840314192082e−09 0.000384597332948461   38.0913978494624 chr12 43225358 43225358 * 5.88670366048575e−06 0.0744157923334262   38.3492637561354 chr17 79088193 79088193 * 3.52427394918426e−06 0.0545787940223735   38.3574879227053 chr18 68880349 68880349 * 2.70694587075847e−11 7.17415879972436e−06   38.4508211300172 chr7 1169917 1169917 * 4.69425789360788e−10 7.29055000210528e−08   38.6078685689581 chr8 144355151 144355151 * 3.19764746170857e−07 0.0111421082907189   38.6296056884292 chr5 58069392 58069392 * 8.23419561991435e−09 0.000771710801062833   38.6675552922591 chr15 20148130 20148130 * 9.13707591786956e−06 0.0940664388454195   39.025974025974 chr14 104945443 104945443 * 1.16290174356482e−06 0.0258309554975365   39.1525773499801 chr10 129899628 129899628 * 6.15352324330379e−15 6.68983233086929e−09   39.5831963710473 chr1 3716165 3716165 * 8.09619152365043e−09 0.000771710801062833   39.9122967964478 chr14 107946866 107946866 * 3.08231924412474e−07 0.0110228886887585   40.0182149362477 chr16 48577998 48577998 * 1.29756787051698e−07 0.00624264840344132   40.1739130434783 chr8 693718 693718 * 5.97422882017508e−07 0.0171823056564859   40.1742160278746 chr11 70782840 70782840 * 3.72801252040681e−06 0.0562556007760165   40.1827872888966 chr12 128157922 128157922 * 6.77675141568647e−07 0.0188247134508835   40.3296913935212 chr13 18250663 18250663 * 7.57785879186208e−09 0.000735562996365631   40.688758934373 chr10 1180765 1180765 * 9.62996488757116e−07 0.0231620852906072   40.893277971047 chr9 138175476 138175476 * 1.20630505355594e−06 0.0264403292184999   41.2218733647305 chr14 104944075 104944075 * 4.52076411415441e−06 0.0630098770048527   41.2292470233104 chr4 3471922 3471922 * 2.66957312530285e−06 0.0452938349135597   41.2751996957018 chr6 67825668 67825668 * 2.84475992745567e−11 7.17415879972436e−06   41.6066115702479 chr10 128403039 128403039 * 6.29865814299345e−06 0.0762540805536126   41.6898120194823 chr5 180296704 180296704 * 3.13838321709385e−07 0.0110853389524534   42.2498554077501 chr11 88069244 88069244 * 5.14081990554248e−06 0.0679910831660104   42.7058111380145 chr15 41847967 41847967 * 1.40527755050629e−06 0.0294933249877442   43.010752688172 chr6 148346764 148346764 * 7.16582237651837e−07 0.0195713318720594   43.4297769740808 chr6 33502742 33502742 * 3.89132376285721e−06 0.0570144381016749   43.940586972083 chr14 23580472 23580472 * 3.78802372723624e−11 8.95253951647558e−06   44.0277777777778 chr2 240642627 240642627 * 3.53239100462726e−09 0.000384025583615735   44.1948991244766 chr2 239231873 239231873 * 8.05974095048366e−07 0.0208949057036708   44.2575650950035 chr7 2250267 2250267 * 1.35028756226762e−06 0.028783756954893   44.2706522998494 chr7 155820590 155820590 * 4.04275136855915e−06 0.0586012874071419   44.8245080787074 chr19 4217590 4217590 * 3.384452201028E−07 0.0114981982991192   46.9534050179211 chr3 149283991 149283991 * 7.44387831389826e−08 0.0040463240197852   47.0008302200083 chr7 155820591 155820591 * 8.91796912180136e−06 0.0928502159932842   47.3914098519777 chr2 180232376 180232376 * 1.06477453395238e−13 5.78787372752045e−08   47.7210652099693 chr14 23580473 23580473 * 1.71813177977381e−12 7.78281338089017e−07   48.0059523809524 chr16 86738680 86738680 * 3.14056875557709e−07 0.0110853389524534   48.1323489198797 chr1 9282145 9282145 * 1.40359488987554e−07 0.00663445618166811   48.1624758220503 chr18 514264 514264 * 1.00811463765718e−04 0.0999525235500815   48.4543010752688 chr15 90825733 90825733 * 1.43789090093169e−16 2.60534999137369e−10   48.7622630000508 chr2 3125151 3125151 * 6.28666306926627e−06 0.0762540805536126   49.2557911571996 chr6 150098459 150098459 *  7.2061840532212e−06 0.0830173843820925   49.4505494505495 chr4 8464685 8464685 * 2.43894396005617e−06 0.0427849084025574   49.546485260771 chr4 39530308 39530308 * 2.53346837886689e−10 4.30355048239587e−05   49.8145604395604 chr2 239231852 239231852 * 2.65292963916663e−07 0.00987720963364853   51.1407829919627 chr7 561796 561796 * 1.83706073120751e−07 0.00762278393825859   51.3027557391672 chr5 30014417 30014417 *   3.823936147381E−10 6.11354490811522e−05   51.8518518518518 chr6 1515332 1515332 * 8.51018543688203e−10 0.000115648611832455   51.854527938343 chr14 68875188 68875188 * 2.43028391168425e−06 0.0427849084025574   52.2673594709495 chr1 42736748 42736748 * 1.63011693494085e−06 0.0325593013379642   52.8028933092224 chrX 144230633 144230633 * 5.98920862096141e−06 0.075013789176032   53.154305200341 chr3 164407943 164407943 * 7.47032136289704e−06 0.0842174212589503   53.5544445799293 chr11 63047952 63047952 * 1.71838145006591e−06 0.033721058513901   53.5555555555556 chr19 22925603 22925603 *  4.7138795638219e−07 0.0132686827348675   53.5714285714286 chr17 1934230 1934230 * 4.58837847730844e−08 0.00291084454940201   54.3565147881694 chr1 21384907 21384907 * 6.21719939671562e−06 0.0762540805536126   55.1418439716312 chr1 151852673 151852673 * 7.83454128121666e−06 0.0856877179041533   56.5939278937381 chr1 151852674 151852674 *  9.2588347154018e−06 0.0942489382327313   57.2693383038211 chr20 17699900 17699900 *  6.0308846261762e−07 0.0172539609726149   57.6095829931859 chr16 88646325 88646325 * 4.76592965227729e−08 0.00291084454940201   58.0498866213152 chr16 4756184 4756184 *  4.4860752513821e−07 0.0138552790985831   58.7208180590392 chr17 1934312 1934312 *  3.5584003055642e−07 0.0119399134954185   59.1778810863714 chr19 22925602 22925602 * 6.41324534532623e−07 0.018156745991534   59.3444227005871 chr17 1934311 1934311 * 7.40598841836225e−11 1.54835689572441e−05   59.876395478158 chr10 29586143 29586143 * 3.46369267902865e−07 0.0116943171482325   60.6714951650807 chr10 29586144 29586144 * 1.44058361680097e−06 0.0300026320652593   60.8906525573192 chr7 83749828 83749828 * 2.39873934363701e−07 0.00913087271952168   61.5900743949524 chr20 17699899 17699899 * 1.59150439114324e−07 0.00697665983085667   63.9708252740168 chr14 104944816 104944816 * 1.45430430995669e−06 0.0300262709961256   66.124287552859 chr10 29585703 29585703 * 1.52353443754042e−06 0.0310171868302878   66.4870689655172 chr22 19443961 19443961 * 4.08318044083221e−07 0.0129796760687627   66.9160702667534 chrX 144230634 144230634 * 1.49666439380404e−07 0.00668118895524527   67.1428571428572 chr14 104950312 104950312 * 2.50669658277121e−07 0.0093971283520804   67.2663383396421 chr9 97132113 97132113 * 1.45829012445645e−06 0.0300262709961256   72.1501545906708

indicates data missing or illegible when filed

TABLE 4 Severe Preeclampsia versus Mild Preeclampsia chr start end strand pvalue qvalue meth.diff chr13 58885851 688858511 * 4.92448123739017e−08 0.00318907968454339 −72.3861566484517 chr5 2564583 2564583 * 3.35175966322353e−12 1.40425632243613e−06 −66.2912949119846 chr4 6534733 6534733 * 3.13578949268644e−06 0.0532057706716535 −59.1775700934579 chr5 2564423 2564423 * 4.42484745942106e−06 0.0686806731879121 −58.7253205920312 chr11 126741853 126741653 * 1.18515862828247e−06 0.0301337427763311 −57.8643578643579 chr1 34795565 34795565 * 4 12823404811012e−10 6.81353616416951e−05 −55.0250515767757 chr14 95198847 99196847 * 5.12113920067973e−06 0.0734006412826014 −54.9138321195145 chr2 229932965 239932965 * 7 92297576639817e−12 2.47746930670009e−06 −54.7993616051072 chr14 99188646 391386461 * 1.54239463709976e−08 0.0013126000080948 −51.7150165688041 chr5 13244533 13244533 * 1.57986340814749e−06 0.0361306254380479 −49.7589914720059 chr3 174374968 174374968 * 6.20657917700822e−16 5.63410474735697e−10 −49.6864893406376 chr5 2564401 2564401 * 5.95224783573461e−07 0.0195046630119335 −49.6863260869565 chr11 8031611 8031611 * 2.41487944068427e−11 5.48025898110474e−06 −47.8846153846154 chr5 136082613 136082613 * 2.35725340053878e−06 0.0455275240709477 −46.4914772727273 chr5 136082619 136082819 * 8.75539719393383e−06 0.0859654597513623 −46.3641778114679 chr5 13244532 13244532 * 3.35120482949683e−07 0.0137235415745884 −46.1070559610706 chr3 126502264 126502264 * 2.09149986887952e−08 0.0422028306781269 −45.4273297923708 chr2 239932966 239932966 * 4.57260799247884e−10 7.11562081433487e−05 −44.4166382284913 chr3 126542502 126542502 * 4.54232170941777e−10 7.11562081433487e−05 −43.9285714285714 chr5 2564529 25644529 * 4.31565941655994e−24 1.17526822435092e−17 −43.2240348803556 chr13 19390659 19390659 * 6.52490963323898e−09 0.000658108668854797 −43.2237045492378 chr11 8031859 8031859 * 4.86986241988084e−19 5.65837844905373e−13 −42.5545654139222 chr22_KI2

135287 135287 * 8.80504264645363e−12 2.52403128071413e−08 −42.4917734251393 chr16 85319282 85319282 * 8.40300792441353e−06 0.0993896011935138 −42.0016927834363 chr1 96224399 96224399 * 6.68037719914728e−06 0.085610672875619 −41.7702864511375 chr22_KI2

135316 135316 * 1.33407106527642e−06 0.0320086431033939 −41.5205419897109 chr7 132311422 132311422 * 1.29523988473596e−10 2.43259092079145e−05 −41.3749691971308 chr22_KI2

135289 135289 * 5.62986053025345e−09 0.00059583211509241 −40.888906123694 chr7 69645871 69645871 * 2.20166404346001e−10 3.86617598980406e−05 −40.8488063660477 chr22_KI2

135295 135295 * 2.27813051386405e−06 0.0446324446716418 −40.7610787940212 chr7 3987779 3987779 * 8.06746169190422e−07 0.0241425065523422 −40.6579217654136 chr5 107784431 107784431 * 3.53979592069215e−09 0.000426432670304054 −40.561767752809 chr16 87996495 87996495 * 5.93550806535499e−06 0.0805044089124615 −40.3094676108051 chr22_KI2

135291 135291 * 3.68320793314607e−11 8.02422501688664e−06 −40.307328605201 chr1 24599267 24595267 * 1.04025218242041e−24 5.68572518953531e−18 −39.9441212492345 chr11 5441676 5441676 * 6.89214054567867e−09 0.000849866181489538 −30.9146606811145 chr10 132375370 132375370 * 7.01631165267647e−08 0.00409210784865686 −39.8526204977618 chr10 16030596 16030596 * 6.74081377014693e−06 0.08998854597513623 −39.7912685862432 chr22_KI2

135285 136285 * 1.72564062131233e−06 0.0014027892296795 −39.7138013861285 chr22_KI2

135299 136299 * 1.23254023023791e−10 2.39750732273891e−05 −30.3233921687989 chr10 507466 507466 * 7.00810033049266e−06 0.0863554684843166 −39.2650258987955 chr3 191332815 191332815 * 6.88651252420596e−07 0.0213109850082429 −39.0554722638681 chr7 17786743 17788743 * 6.00864383835461e−05 0.0805044069124615 −38.4567630733275 chr2 3059465 3050465 * 3.07734839194474e−06 0.0531842024802153 −38.2179129298569 chr7 3987668 3987668 * 1.11040068873981e−09 0.000151929561318161 −38.2164050801398 chr18 6986228 6986228 * 2.05113183366007e−10 3.72382437433588e−05 −38.1118881118861 chr10 337807 337807 * 6.00704637134146e−05 0.0605044069124615 −38.0769230769231 chr7 3987667 3987667 *  7.4742522431422e−07 0.0227421536582436 −37.6096308400307 cnr22_KI2

135264 135284 * 4.61660991055785e−10 0.0168828992811911 −36.975649682832 chr11 8031860 8021360 * 6.94997205283417e−10 0.000128276818910468 −38.9689632401497 chr22 25357353 25357353 * 1.74524969943093e−06 0.0383209636441827 −36.7647058823529 chr13 97774924 97774924 * 5.98051361144579e−07 0.0195046830119335 −36.6323417238749 chr22_KI2

135293 135293 * 2.78209747912251e−07 0.0119312375301267 −30.4374112396024 chr11 890198 890108 *  2.464399089143e−06 0.0470492486041918 −36.2764556072715 chr5 136082560 136082560 * 2.62585999305225e−06 0.048774456437286 −38.231884057971 chr22_KI2

135232 135263 *   1 64713863102e−07 0.00793905077587731 −36.0171635515738 chr4 1027464 1027464 * 8.64024183408533e−08 0.00480193959462267 −35.8465808485608 chr13 111424321 111424321 * 2.56225580563267e−06 0.0481217439563169 −35.8437566249735 chr5 3366831 3388831 * 7.76032003656057e−06 0.00440276260380905 −35.7737653557589 chr4 8691677 8691677 * 4.48814647635642e−06 0.0686581799695674 −35.6140729548888 chr1 24595268 24695268 * 8.18773763380201e−12 2.47746930670009e−06 −36.508229163011 chr7 904312 904312 * 2.82356615985535e−06 0.0505870922243516 −36.4920100925147 chr9 95680618 95680616 * 4.98560458056026e−07 0.0175117800967409 −35.4768786127168 chr10 114489621 114489821 * 3.37242998390943e−06 0.055829522577273 −35.4368932038835 chr3 75792481 75792481 * 8.45615169039445e−06 0.0993698011935136 −35.4193341869398 chr22_KI2

135281 135281 * 2.46360164963479e−06 0.0470492486041918 −35.3036138332256 chr10 132311421 132311421 * 1.25312114276786e−19 2.27503809303867e−13 −35.2910208894879 chr15 25802206 25802206 * 4.68220148803728e−09 0.000531282769728816 −35.2173913043478 chr10 132375369 132375369 * 9.17165642026229e−07 0.0254884047550052 −34.8463356973995 chr8 23263546 23263546 * 6.45430024317848e−07 0.0203198234913697 −34.8381328816111 chr2 312992 312991 * 4.25324025626474e−06 0.0663760430653978 −34.1097480106101 chr1 58910511 56510611 *  2.1559377456398e−07 0.90954856348302861 −33.2446669796067 chr22_KI2

135218 135218 *  4.6556930481352e−06 0.0698771984183716 −33.0687713290467 chr11 5441675 5441075 * 7.83599690933602e−09 0.000762119544262917 −32.6813559322034 chrUn_JTF 692 692 * 2.38165501919187e−09 0.000301666627604682 −32.3000764144988 chr11 69641427 69841427 * 6.83625937953498e−07 0.0247250346151797 −32.2670250896057 chr18 6988227 6986227 * 9.83311674288533e−07 0.026644771977707 −31.6245954692557 chr4 147741995 147741995 *  2.7309153965415e−06 0.00209491674083664 −31.5789473684211 chr22_KI2

135214 135214 * 1.81370521147475e−06 0.0362298238733574 −31.1381194618577 chr2 240900827 240900827 * 1.00272860602628e−07 0.00546135333089328 −31.09825620389 chr7 82335540 82335540 * 4.66000575787923e−11 9.7618015693243e−06 −31.0788904579903 chr3 126540525 126540525 * 5.59460969374949e−08 0.003385668326681 −30.9444613094448 chr5 2564154 2564154 * 3.31561199169442e−10 5.64326693370866e−05 −30.327868852459 chr22_KI2

135184 135184 * 2.15115023860617e−07 0.00954658348302861 −29.8768020869656 chr13 19390621 19390621 * 6.22500411692352e−06 0.0824322876531253 −29.8389129468325 chr13 114199991 114199991 * 7.36305132975948e−06 0.0909360547916652 −29.7918886668861 chr14 104159421 104159421 * 8.97817613899101e−07 0.0252059810905183 −29.7872340425532 chr22 36810296 36610298 *  9.7925578165256e−07 0.026644771977707 −29.7376093294461 chr1 1174524 1174524 * 3.80036405270521e−08 0.00254668935196708 −296628475424125 chr17 76888520 76888529 * 2.69924920290801e−08 0.00209491674083664 −29.457437742638 chr11 82736713 82736713 * 3.833979211525490-08 0.002671458253075 −29.4373162256081 chr2 67894619 87694619 * 1.24857832641374e−06 0.0310519261803968 −29.3589743589744 chr15 25802207 25602207 *  9.0939310259263e−07 0.0254000115885122 −29.0611783956948 chr13 111424320 111424320 * 1.46313496403555e−07 0.00737866007087029 −29.012526652452 chr16 5861285 5661285 * 3.06745002204275e−06 0.0531842024802153 −28.8888888888888 chr2 240900828 240900828 * 1.93972594653999e−07 0.00880391820835207 −28.8602941176471 chr5 3386320 3366320 * 9.77734786435600e−09 0.000892455196703998 −28.7586440350114 chr2 21043838 21043838 * 7.41918042410687e−06 0.091010150510033 −28.5491071428571 chr19 536536 536536 * 1.18952799286425e−06 0.0301337427763311 −28.3422459893048 chr2 71157442 71157442 * 3.28828585961074e−06 0.0547694881592753 −28.3006257974361 chr15 25177929 25177929 *  4.5248663580141e−08 0.0689077210559564 −28.2614881206431 chr2 21043849 21043849 * 2.93556654246345e−06 0.0520882886068996 −28.1921083253993 chr10 133079171 133079171 * 1.59209375268922e−06 0.0364305254380479 −27.8610694652674 chr10 43735564 43735564 * 1.63131665990732e−06 0.0384137432602935 −27.7803379416283 chr22_KI2

135337 135337 * 1.11579755422875e−09 0.000151929561318161 −27.5953389830508 chr22_KI2

151971 151971 * 1.59124062873556e−06 0.0361305254380479 −27.3683176100629 ch311 1310405 1310405 * 5.33609782299179e−06 0.0756849325628703 −27.2635914889336 chr1 1183524 1183524 * 7.71139187586468e−06 0.0935412787544808 −27.1919879082736 chr6 107986042 107986042 * 5.89672830903689e−08 0.08049244300033657 −27.0833323333333 chr12 34803294 34603294 * 4.74602141870145e−06 0.00311560998057664 −26.742994723631 chr22 50221241 50221241 * 6.37996028043536e−36 0 0833664425943691 −26.666666666667 chr11 73669807 75668807 * 3.22001077021541e−07 0.0133854753477995 −26.5592841183311 chr5 179558764 179559764 * 2.73908703612642e−06 0.0504675755135091 −26.4172335600907 chr11 66317171 65317171 * 5.06888594491374e−06 0.0730044536391841 −25.9856188461891 chr10 129392359 129692359 * 6 78295134636805e−06 0.0861148957126395 −25.8333333333333 chr1 143660428 143660428 * 1.66382915759149e−06 0.0372803334504422 −25.8642916191001 chr13 26721718 26721718 * 1.30995278279394e−06 0.0317095433019338 −25.4052684903749 chr21 24209067 24209067 * 1.65875894177013e−06 0.0368758395260794 −25.0793650793651 chr1 37761722 37761722 * 2.72052738051602e−06 0.00209491574083884 −26.0102847739635 chr12 34596867 34595867 * 3.24407480247751e−07 0.0133854753477995 −24.5849939351932 chr2 239124912 239124912 * 1.27843694629087e−07 0.00663142533378825 −24.4055944055944 chr17 50508777 59508777 * 1.22381168237542e−06 0 0307185051929208 −24.0464461646728 chr14 99644817 99644817 * 1.77708954393715e−06 0.0383239636441827 −23.9861060852437 chr1 1014255 10142555 * 1.05779778214713e−06 0.0276674136531402 −23.7411347517731 chr16 1813014 1813014 * 3.79009066345539e−06 0.060822916883671 −23 6842105263158 chr12 131003860 131003860 * 4.92083237806295e−07 0.0174034247925778 −23.6567185250021 chr21 45924346 45924346 * 1.27148608905986e−06 0.0311141973151349 −23.5351117191062 chr22_KI2 135190 135190 * 2.45948166748495e−07 0.0106313886873837 −23.1272260173994 chr17 541517 541517 * 1.84086759655569e−07 0.0084959183392371 −22.9565363881402 chr17 6657726 6657726 * 4.23135730054393e−08 0.0662242922027417 −22.9131853266402 chr17 76125688 78125668 * 5.46217618665255e−08 0 076326089849964 −22.8026533996683 chr12 131003874 131003874 * 3.95608413352707e−07 0.015958753482099 −22 5504599887583 chr17 82602604 82602804 * 6.60019338701713e−06 0.0847827841529265 −22.548393898251 chr5 179559762 179559782 * 9.93107668605606e−07 0.0267769949116706 −22.479420684533 chr2 21043749 21043749 * 6.78820398741178e−06 0 00401750062258783 −22 4768502155934 chr13 26721718 26721718 * 4.51847364258328e−06 0.0689877210559584 −22.1915430893842 chr12 131003900 131003900 * 9.83152307985395e−09 0.000892455196703998 −21.8302743900014 chr15 23562122 23562122 * 9.53224069739073e−07 0.0263539455332046 −21.8302175020004 chr13 26721799 26721799 * 5.68840647469251e−09 0.00059583211509241 −21.7505731530436 chr11 134123845 134123845 * 7.40755475669141e−08 0.091010150510033 −21.6851829489192 chr1 1174573 1174573 * 8.670S3493169424e−07 0.0247256346151797 −21.6071022080583 chr12 131003845 131003845 * 3.32142098906435e−08 0.0551527220901558 −21.3448439727548 chr5 179559730 179559730 *  4.4493731793013e−08 0.00302918442825989 −21.2669683257919 chr10 54197246 64197246 * 6.39350151946486e−07 0.0202454383824147 −21.1297440423654 chr6 168274189 168274189 * 7.62683233953893e−06 0.0927220565237502 −21.0458671379227 chr14 45254118 452541185 *  1.7675763034448e−06 0.0383260638441827 −21.0438883523241 chr5 179559727 17955972 * 5.18514010684151e−08 0.00323355106686028 −20.8166344476402 chr15 26670544 20670544 * 0.57081633245934e−06 0.0846049612014401 −20.7447642952388 chr5 1873843 1873848 * 5.29821095587927e−08 0.0755410041505466 −20.6499856197872 chr1 4 47626570 47626570 *  4.4855641752075e−06 0.0686581789695674 −20.5314854996384 chr11 32587024 32587024 * 4.02438809105929e−07 0.0154357729174998 −20.4469388219758 chr22_KI2

135172 135172 * 2.96363688962771e−06 0.0524072227605916 −20.3497023809524 chr4 991538 991538 * 5.33854116705892e−08 0.00326706364126484 −20.3204539990626 chr12 43224372 43224372 * 2.11953882648195e−06 0.0422859020989609 −20.1728148886673 chr8 59626782 69626762 * 5.55504783871103e−07 0.018876249143924 −20.1510152624661 chr13 26721860 26721860 * 8.17251545891727e−11 1.6485575975412e−05 −20.0897163804746 chr2 43068231 43068231 * 4.54198259339927e−08 0.0689077210558564 −20.0117508813161 chr3 128076766 128076786 * 3.20533786537605e−06 0.0537164527692233 −20 chr12 131003954 131003954 * 5.61128333821062e−10 6.48939161424928e−05 −19.9837734802737 chr18 87996503 87996503 * 1.58225402892991e−06 0.0361305254380479 −1S.8601232811759 chr10 75406332 75406332 * 5.85664856383476e−06 0.0804262434607438 −19.7413793103448 chr14 96204295 96204295 * 1.04834808342838e−06 0.0279299980797102 −19.8880875141545 chr7 1664582 1664582 * 5.95823912985958e−07 0.0805044069124615 −19.8078431372549 chr11 358649 358649 * 2.31581423988576e−06 0.0448824712900888 −19.4798628556983 chr1 143850348 143650348 *  1.3165886821617e−11 3.5853863813919e−08 −19.3658173178357 chr5 179658725 179559725 * 3.04205651851537e−07 0.0127450280704937 −19.3580053008737 chrX 119895971 119895971 * 4.88691500817437e−08 0.0714098888353116 −19.3543205910697 chr1 57300518 57300518 * 3.83573318221001e−06 0 0612647800081921 −19.3141254462009 chr7 47193273 47183273 * 6.26906520762471e−06 0.0626741246992889 −13.3043884220355 chr17 2564522 2564522 * 6.01856848104745e−07 0.019505471487784 −19.1745823779889 chr1 152300237 152309237 * 5.22317519281422e−06 0.0746666191660396 −18.146957214988 chr7 155491719 155491719 * 1.95916917726278e−06 0.0407935481402735 −19.1120689656172 chr17 4856719 4856719 * 3.48983540689618e−08 0.0570565231594122 −18.928284604811 chr1 4482715 4482715 * 1.55786243171406e−06 0.0361056947720476 −18.9009661835749 chr17 65080533 68080533 * 1.27315850007177e−06 0.0311141973151249 −18.8689217758985 chr13 26721781 26721761 * 4.42911525026881e−12 1.67058223393983e−06 −15.845090571655 chr1 143850313 143650313 * 1.14052091864076e−06 0.029439990801941 −15.679233544855 chr4 18246094 18246094 * 7.20843667867115e−06 0.0900474803277271 −18.5714285714286 chr1 3297266 3297268 * 5.58034558886507e−06 0.0781318961619251 −18.5441616766467 chr7 2734909 2734909 * 1.21597197481903e−06 0.0306610260790559 −18.534252179112 chr5 1594618 1594818 * 2.05158815175506e−06 0.0420537840569778 −18.3320220298977 chr10 131357947 131357947 * 1.93734979274014e−06 0.0407935481402735 −18.2949920339883 chr3 80770636 80770636 * 7.58258032655166e−07 0.0229435906044005 −18.2008484983811 chr16 830459 830459 * 4.17496376388107e−06 0.0655200906056461 −18.1907862286673 chrl 161765500 161765500 * 3,30954960805627e−06 0.06243586965581149 −18.1046387096774 chr11 5226562 5226562 * 4.53915870642454e−06 0.0689077210559564 −17.9129298588829 chr13 28721798 26721798 *  4.4231700015475e−07 0.0163882721279377 −17.7473780209758 chr1 56536130 56536130 * 7.51249604815342e−06 0.0919477519692117 −17.7083333333333 chr13 26721855 26721855 * 5.47002433996104e−07 0.0185599690146348 −17.5969869691375 chr12 131003956 131002858 * 4.01623544629303e−07 0.0154357729174998 −17.5598080124402 chr14 45254151 45254151 * 5.65838055051808e−06 0.0787913846991108 −17.3858221620178 chr3 26721846 26721846 * 2.66006932982005e−06 0.0488461024469941 −17.3768519961052 chr22_KI2

130050 130050 * 5.87711225819775e−06 0.0804262434607436 −17.3670977011494 chr5 44975082 44975082 * 5.37876922683487e−06 0.0760920092566294 −17.3486088379706 chr22_KI2

130072 130072 * 5.01577473186701e−07 00175117800967409 −17.1248373776797 chr18 41866148 41866148 * 2.20483911205559e−06 0.0436677786730833 −17.0957189588363 chr12 131003958 131003958 * 1.71412988543137e−06 0.0377975495131531 −17.0848357184087 chr7 157656310 157656310 *  3 8200747790465e−08 0.0612647800061921 −17.0212785957447 chr6 144085281 144085281 * 6.10346448657239e−06 0.0814766433784329 −17.0193855245142 chr10 485209 485209 * 1.96083731196005e−06 0.0407935481402735 −17.0169234312968 chr16 1259559 1259559 *  1.047808558358e−07 0.00584628788062724 −17.0055345439589 chr12 131003930 131003930 * 1.35928521793845e−08 090117521916101005 −16.8487742757084 chr9 68617416 68617416 *  2.8099492607405e−06 0.0505348061207153 −16.7318697662388 chr5 2564519 2564519 * 1.87750879951218e−07 0.00659314004745804 −16.7184593500383 chr15 93109787 93109787 * 5.61434744676122e−06 9.0764064052923216 −16.6718218373028 chr12 131003937 131003937 * 5.30496413706434e−08 0.00326706384126484 −16.5249359290018 chr14 96204284 90204284 * 2.29045450318785e−06 0.0447130542217479 −16.3123890171037 chr12 132406949 132406949 * 7.07980111491958e−03 0.0890507871491927 −16.018132631425 chr11 71791438 71791438 * 5.82850642299545e−06 0.080136415709074 −16.0022871452862 chr13 113453322 113453322 *  8.1774906695496e−07 0.0242057812564003 −15.9806806801203 chr3 87740644 87740644 * 2.60145167832789e−06 0.0466898854103074 −15.9473564783299 chr13 26721792 26721792 *  3.9940376849237e−09 0.000472902049970336 −15.8225882773086 chr1 161765471 1717658471 *  1.6712203451711e−07 0.00798446336860869 −15.5247963707599 chr8 8889239 3889239 *  8.1012273145316e−06 0.0971878186520002 −15.4873164218959 chr5 1594626 1594626 * 3.50097645817091e−06 0.0572613821970457 −15.4499854910252 chr8 83403144 83403144 * 4.60577226629857e−06 0.0692964138182791 −15.3224839057939 chr1 1901639 1901639 * 6.53811013845974e−06 0.0845756135183432 −15.3160408277929 chr17 82255105 82555105 * 4.08389779638427e−08 0.00311107462044008 −15.2852233676976 chr1 143650108 143650108 * 4.07686705382738e−07 0.0154900489905263 −15.2726613925962 chr10 26905429 26905429 * 6.46447760981082e−06 0.0642314035294549 −15.2222222222222 chr1 143650354 143650354 * 1.09106622091776e−07 0.00582596416048903 −15.1225664563142 chr1 161765462 161765462 * 3.16195968986384e−06 0.0534831930284655 −14.6486486486486 chr5 26455521 26455521 * 3.66253721183468e−06 0.059546207832716 −14.6148129799356 chr13 26721761 26721761 * 6.35885294614058e−07 0.0202454388824147 −14.5494747238933 chr2 237374812 237374812 * 6.01583863791704e−06 0.0605044069124615 −14.5002166450022 chr11 794471 794471 *  2.6771698710741e−06 0.0490948965839937 −14.4927536231884 chr8 69626749 69626749 *  1.2439216864157e−06 0.0310398119604081 −14.4447664589673 chr14 86905800 88905800 * 4.67246155974962e−08 0.069721984946531 −14.3822314292778 chr7 151395933 151385933 * 8.57553878334847e−07 0.0205825398261493 −14.261530652089 chr3 11094200 11094200 * 8.44860879825346e−06 0.0993898011935136 −14.1490765171504 chr13 26721885 26721885 *  4.6020058423205e−06 0.0692964138182791 −14.1257827634365 chr4 4862758 4862758 * 3.18455825658399e−06 0.0535329353951193 −14.0882194728349 chr22 16423589 16423589 * 8.40540642183114e−07 0.0243650192427174 −14 0361872581612 chr17 82254249 82254249 * 2.98732669855532e−06 0.0524853239544041 −14.0114068441065 chr7 103348856 103348858 *  2.9995918694905e−07 0.0126645373026706 −13.9090178673911 chr4 52076911 52076911 * 4.71292611684663e−07 0 0169628567466589 −13.834243988657 chr4 2057497 2057497 * 4.48799459056866e−06 0.0688581799695674 −13.7688591846184 chr2 241209373 241209373 * 3.12140531562756e−06 0.0532057706716535 −13.719533733663 chr16 86504405 86504405 * 2.81135946965069e−06 0.0505348081207153 −13.6753731343284 chrX 49204268 49204268 * 1.52834466908663e−06 0.0357258240919441 −13.6101272464909 chr19 18309702 18309702 * 2.96076626534099e−06 0.0524853239544041 −13.5828267477204 chr5 141728980 141728980 * 1.84054810269114e−07 0.0084959163592371 −13.2978723404255 chr7 4313888 4313888 * 6.55300859796376e−08 0.0845456135183432 −13.2412661326213 chr21 46391285 46391285 * 3.41633507937604e−08 0.0563849747852756 −13.2048872180451 chr6 40392378 40392378 * 7.16643972135821e−06 0.0897286358870338 −13.1810341200724 chr15 43592915 43552915 * 2.36424825066912e−07 0.0103014873466267 −13.1208813434035 chr17 88060764 63080764 * 2.50668574277778e−06 0.0474056133491432 −13.0354844640559 chr7 4783206 4783206 * 6.38653554754939e−06 0.0993896011935136 −13.0193236714976 chr8 11661684 11681684 * 3.96060168080359e−06 0.0628414933614009 −12.9044834307992 chr2 160217 160217 * 8.22738121183723e−07 0.0242218194330929 −12.7600966658475 chr19 55851218 55851218 * 2.10380518109225e−06 0.0422026356781289 −12.7328565078595 chr8 97277742 97277742 * 5.96115325576567e−08 0.0805044069124615 −12.6268365938794 chr19 17300931 17300931 * 8.46083165374636e−06 0.0993896011935136 −12.5751524069792 chr17 50537161 50537161 *  2.7140603178109e−06 0.0492900321709151 −12.5628140703518 chr16 57799576 57799576 * 7.27338729961737e−06 0.0902310306896508 −12.5 chr7 103348635 103348635 * 1.38193720164451e−06 0.0330118855844703 −12.43666393133 chr3 140651528 140651528 * 7.56449190844073e−05 0.0923766575412269 −12.3973727422003 chr1 181765468 161765468 * 2.14718100668988e−06 0.0426610371368189 −12.3321511556806 chr4 991509 991509 * 8.00221764880927e−06 0.0962889791844913 −12.3015873015873 chr1 16352354 16352354 * 6.28717146735685e−06 0.098982484567516 −12.0481927710843 chr7 132898870 132898870 * 2.09872777363957e−06 0.0422028306781269 −11.9698994303397 chr1 4734937 4734937 * 1.32677290581186e−06 0.0319745929969958 −11.7240033997966 chr11 77102985 77102985 * 7.23333211567592e−06 0.0901516830695011 −11.6711856441718 chr6 48984943 48984943 * 5.44744353303574e−08 0.076839904282006 −11.6263034953112 chr2 174597724 174597724 * 5.66182894733513e−07 0.0188024492491642 −11.62411141738 chr15 93109798 93109798 * 4.10145035343818e−06 0.0645621885271625 −11.5681801457061 chr2 208227882 208227882 * 9.96282827695804e−08 0.00546135333089328 −11.5132827324478 chr2 43899527 43665527 *  1.4593044288804e−06 0.0342590081787141 −11.4503816793893 chr2 208227863 208227863 * 8.40677562297141e−07 0.0243550192427174 −11.3602805463271 chr8 141311868 141311868 * 8.46248937588812e−06 0.0993896011935138 −11.1888111888112 chr2 208227880 208227880 * 9.59222417137886e−07 0.0283858844513235 −11.1607142857143 chr18 79831604 79831604 * 1.55233074871489e−06 0.0361058947720478 −11.1084462017948 chr8 23174099 23174099 * 3.24401256614072e−06 0.0541978174521009 −11.0568147686052 chr16 87644974 87644974 * 8.46724367791053e−06 0.0993896011935136 −10.9837193304288 chr5 73909679 73909679 * 1.78711816075786e−06 0.0383209636441827 −10.9370403059723 chr16 87644983 87644983 * 1.96183037358965e−06 0.0407935481402735 −10.7316563869777 chr8 31244455 31244455 * 6.41383947352033e−06 0.0839707598158959 −10.6145884058299 chr13 20988513 20988513 * 2.10782632983772e−08 0.0422028306781269 −10.5820105820108 chr5 174727258 174727258 * 8.74800733169197e−07 0.0247276251033757 −10.5327868852459 chr13 26721874 20721874 * 6.34193931008917e−09 0.000651723051262004 −10.4363028276072 chr7 112790652 11290652 *  1.1827481226228e−07 0.00625420212415542 −10.3943407585792 chr18 79420768 79420768 * 1.43210284086288e−06 0.03386889550114558 −10.2611206615656 chrX 9680823 9890823 * 5.11450989461697e−08 0.00323355106688028 −10.1117886178862 chr7 5213420 5213420 * 1.43647114850058e−06 0.0386669550114556 −10.0653741632931 chr8 144424694 144424694 * 7.95511286855291e−06 0.0960697529881829 −10.0591715976331 chr5 1594768 1594768 * 7.71097114265812e−07 0.0232031727296786 −9.97294372294372 chr19 11674350 11674350 * 8.14925655519208e−06 0.0975491442501126 −9.76600985221675 chr8 51859968 51859968 * 1.78596448155091e−06 0.0383209636441827 −9.47368421052631 chr1 171512085 171512085 * 1.83932226412483e−06 0.0392857019489325 −9.39844449032968 chr1 222473534 222473534 *  2.2551123433035e−06 0.0445016352786368 −9.22771123206127 chr8 142317199 142317199 * 6.86345119172967e−08 0.0867325588634461 −5.59760394644116 chr1 143665074 143655074 * 4.64356953105479e−08 0.0695473390748895 −8.34165077443195 chr6 27139558 27139558 * 3.00733618704668e−06 0.0524961810386534 −7.89521778860798 chr10 117134164 117134164 * 6.39179744992982e−06 0.0833864425943691 −7.69230769230789 chr3 49588913 49686913 *  4.7651168533535e−07 0.0169626587456569 −7.1856284251497 chr7 19143956 19143956 * 4.75311863477424e−06 0 0703473440743174 −5.09573982799097 chr20 9507030 9507830 * 9.74842783691994e−07 0.026644771977707 6.60146899268504 chr4 173530158 175530158 * 7.41632016665465e−06 0.091010150510033 6.78705922629237 chr6 160766958 160766958 * 7.60809065355089e−06 0 0927511294851 10.3838824583793 chrX 102769283 102769283 * 1.07147891551537e−06 0.0280567372188611 11.6205012765257 chr3 2562330 2962330 * 5.00810922871852e−06 0.072737615982152 12.7913279132791 chr6 19143512 19143912 * 3.93570692954429e−06 0.0628777699789228 13.1667315203136 chr18 14999781 14999781 * 8.61006335702661e−07 0.0247250346151797 13 9360070730185 chrX 101078809 101078809 *  3.4779754911831e−06 0.0570565231594122 14.1369357187435 chr11 134524258 134524258 * 1.29339543137739e−06 0.0314485172760425 14.7020348837209 chr22 49495230 49495230 * 3.95472048588311e−06 0.0627969489463513 14.7742818057456 chr3 58634151 58634151 * 1.77836418642227e−06 0.0383209636441827 15.0747756729811 chr14 105072092 105072092 *  4.1535117364589e−07 0.0156014285134611 15.6081445523193 chr6 170244465 170244465 * 4.80259132750802e−06 0.0706953384987618 15.8248296319728 chrX 131082922 131082922 * 5.32128227621392e−06 0.0758718573032399 15.9548648073238 chr16 34134797 34134797 * 1.26797707427921e−06 0.0311141973151349 18.0759326833878 chr6 33908181 33908181 * 4.68612236406862e−06 0.0169626567458589 16.2448100131944 chr5 141340409 141340409 * 4.51018579728723e−07 0.0165977640969179 16.3002602793177 chrX 87517786 87517786 * 4.095419664166616-07 0.0154900469905263 16.304347628067 chr9 89282539 89282539 * 4.59889238463388e−06 0.0892964138182791 16.5402223675605 chr15 70377264 70377264 * 2.05385531809394e−06 0.0420537840569776 16.6666866668667 chr8 41287058 41287058 *  5.2933562089499e−07 0.0163631295002267 16 8895372396467 chr16 87555699 87555699 *  4.4115326422306e−07 0.0163882721279377 17.1391702185899 chr13 20191903 20191903 * 5.75225863995006e−06 0.0793155206694152 17.5 chrX 38220439 38220439 * 6.04365590629082e−06 000361722238751272 17.8014454165082 chr2 3595039 3595039 * 8.00862418793134e−08 0.0962889791844913 17.808744321558 chr4 8975138 8975138 * 3.06569405476921e−08 0.0531842024802153 18.1483614582733 chr6 26575634 26575634 * 1.71994185373697e−08 0.0014027892296795 18.5060140491895 chr15 39976519 39976519 * 5.41642540434772e−07 0.0185599600146348 16.7257398055011 chr12 10184240 10184240 * 3.04233383534199e−06 0.0529394489505411 18.8100961538462 chrX 102518010 102518010 * 7.14652520671625e−08 0.004097209722755862 18.9068393261252 chr10 130045285 130045285 * 3.60989003609393e−07 0.0143512680308773 19.0101743749812 chr14 56991860 56991860 * 1.94098410171215e−06 0.0407935481402735 19.047619047619 chr10 131002291 131002291 * 1.27393301987318e−06 0.0311141973151349 19.1568521184934 chr11 2335408 2335406 * 5.11338829067549e−06 0.0734006412826014 18.3214285714256 chr6 28323335 26322335 * 2.4705966868042Se−06 0.0470492486041918 19.3381813013715 chr10 1939365 1939365 * 4.78729152727838e−06 0.0706610975744969 19.4548228059341 chrX 9786687 9786687 *  2.8528009405353e−06 0.0509434672156153 19.5652173913043 chr9 128123005 136123005 * 6.38307648440232e−10 9.3860472993744e−05 19.640315190927 chr5 141376448 141376448 * 1.72097120390688e−07 0.00815065729982553 20.0885559219437 chr3 13378377 133783777 * 4.31398536965116e−06 0.0671316765826345 20.2777777777778 chr21 44391581 44391581 * 3.87145880576543e−07 0.0152796155173413 20.513845611147 chr2 144996436 144996436 * 3.52539420708921e−07 0.0142228861820428 20.5388137042813 chr17 16600593 16890593 * 3.10825504189033e−06 0.0532057706716535 20.5541833899352 chr3 58931710 58931710 * 5.69682803373614e−06 0.0791523862100819 20.6349206349206 chr7 928901 928901 *  4.855420099465e−06 0.0712803415859173 20.8764814741899 chr17 82444953 82444953 * 3.62680894294201e−06 0.0591416739319224 20.9060867940188 chr6 13848066 13848066 * 2.87204146383313e−07 0.0122207428565902 21.0434425387696 chr3 43524113 45524113 * 1.96409627568811e−06 0.0407935481402735 21.1669041830832 chr3 133783816 133783816 * 3.17835747878176e−06 0.0535328353951163 21.6176359032502 chr1 227569821 227559821 * 1.35670287958669e−07 0.00697101073589234 21.6293573894643 chr18 14459067 14459067 * 5.64397650463842e−07 0 0186024492491642 21.6940639269406 chr2 7671793 7671793 * 5.52319620914389e−06 0.0775310411534318 22.0228420621436 chr13 91399621 91399621 * 6.01585270177058e−06 0.0805044069124615 22.1772379667117 chr9 138175007 138175007 * 1.54560809554006e−06 0.00758054895275172 22.2551928783383 chr3 156801089 156801089 * 2.06138677178192e−06 0.0422028306781269 22.4215998658393 chr20 29409561 29409561 * 2.06836443110187e−06 0.0421922484160333 22.5413223140498 chr5 1857306 1857306 * 1.61542972410629e−06 0.0362298238733574 22.8949161782745 chr4 187077799 187077799 * 6.69844507674789e−05 0.0856409096782738 23.1278619232124 chr7 928941 928941 *  6.7623954004734e−07 0.0247276251033757 23.282707961088 chrX 64205954 64205954 * 7.19774324394541e−07 0.0220238488742712 23.304941417225 chr14 72024078 72024078 * 1.11963392176793e−06 0 029036462846846 23.4848464848485 chr20 29327605 29327605 * 2.21688948158047e−07 0.00974609984702593 23.4852499558382 chr12 131682651 131682651 * 1.06654392288074e−06 0.0280567372188611 23.507228158391 chr5 141340332 141340332 * 5.45661064546844e−07 0.0185599690146348 23.920298950251 chr17 80444289 80444289 * 1.56639682506849e−06 0.0361305254380479 24.0074557315937 chr13 26742990 26742990 * 3.76167682506849e−08 0.0607950822976882 24.4664958157725 chr7 48848154 48848154 * 3.25247853927017e−06 0.00242665581524988 24.4950978876071 chr5 140850727 140850727 * 6.16727589384563e−06 0.0821271853731615 24.6682015167031 chr1 34794600 34794600 * 1.05125449488232e−06 0.0279299960797102 24.6813933729822 chr8 142700204 142700204 * 6.20362094554192e−07 0.0199928827520275 24.7757646132225 chr18 14999751 14999751 * 3.95745105060171e−07 0.0153958753482099 25 chr18 14459028 14459028 * 3.42402841145458e−06 0.00245380S90233896 25.0255195041925 chr6 10488257 10488257 * 6.20507731296274e−06 0.0824290340108123 25.0724637881159 chr3 149098135 149098135 * 3.00012805S09595e−05 0.0524981810388534 25.2777394084101 chr18 144590001 144590001 * 1.90462950722191e−06 0.0403640053464906 25.4508687218852 chr20 29327570 25327570 * 2.97162559695699e−06 0.00224605693600296 25.4985915492958 chr11 104365610 104365610 * 2.91276278423939e−06 0.0518442457590117 25.6756756756757 chr19 312027 312027 * 1.70996790180858e−06 0.0014027892296795 25.6893615750007 chr17 38967385 38967385 * 4.71624653174529e−06 0.0699918229023937 25.8905099894848 chr3_K127

100778 100778 * 3.78783209546821e−06 0.060892915663671 25.9943769122633 chr18 34163380 34163380 * 1.18258929007038e−06 0.0301337427763311 27.0676691729323 chr20 29327582 29327582 * 1.75874955641314e−06 0.0383209636441827 27.2700962303741 chr6 44300399 44300399 * 6.23559210462214e−06 0.0824322676531253 27.2727272727273 chr2 131830102 131830102 * 5.75002517798597e−06 0.0793155206694152 27.7456239412761 chr2 3595133 3595133 * 3.49083836034092e−07 0.0141886740320074 28.1515151515151 chr7 6847529 6847529 * 1.02368761539681e−06 0.0274655488067424 28.2588530583292 chr16 33516978 33516978 * 4.64798836079939e−06 0 0695473390746895 28.4582494969819 chr12 132306911 13236911 * 2.18847746776791e−06 0 0017512694171144 28.5575902947553 chr5 180136675 180136675 * 8.37460211822385e−06 0.0993896011935138 28.5858585858588 chr1 181508966 181508966 *  2.3469478792325e−09 0 000301666627804682 28.6469200943359 chr3 195066314 195066314 * 7.15776572770304e−07 0.0220238488742712 28.8111888111888 chr2 32886786 32886786 * 8.01885771230533e−06 0.00450254066045791 28.9589540623471 chr3 31453095 31453095 * 5.71325563683518e−06 0.0791788288040146 29.3018682399213 chr6 57856838 57856838 * 6.54950554375731e−06 0 0645756135183432 29.4098143236074 chr15 55300605 55300605 *  2.0095006012122e−06 0.0414573066230963 29.4919655900016 chr11 44973084 44973084 * 4.48398099646763e−06 0.0688581799695674 29.9889462048637 chr17 44210851 44210851 * 5.05528238089629e−06 0.0730044538391841 30.0059417706476 chr20 29316291 29319291 * 2.31249539612642e−06 0.0448824712900888 30.1051401889159 chr18 84583062 84583062 * 4.89558226964777e−06 0.0714098868353116 30.1058201058201 chr18 14459005 14459005 * 1.85571362882208e−09 0.000246515352040742 30.1441127729647 chr11 67732143 67732143 * 2.71358467207263e−06 0.0492900321709151 30.2325581395349 chr15 63459143 63459143 *  6.4598735838223e−06 0.0842314035294549 30.4066402680475 chr18 14458946 14458946 * 1.31534148752621e−06 0.00115601043150803 30.6574219783925 chr17 7589975 7589975 * 3.10161402351736e−06 0.0532057706716535 30.752320468979 chr2 217151607 217151607 * 1.48555255261793e−07 0.00742298173707828 36.8823529411765 chr12 132845363 132849383 * 2.69479562996917e−06 0.0492622950656391 31.879950226155 chr16 100279068 100279068 * 7.06249339511167e−08 0.00408210784856686 32 chr20 30898792 30896792 * 4.45025009354426e−08 0.000515707479414639 32.1693121693122 chr20 29327613 29327613 * 2.02907500000052e−11 4.80493076300122e−06 32.5396825396825 chr3 153369305 153369305 * 5.48638465154488e−07 0.0185599690146346 32.632598351335 chr2 32886723 32886723 * 7.52619510213316e−12 2.47746930570009e−06 32.8019855591285 chr22 45507251 45507251 * 2.22369779641187e−13 1,1010320235066e−07 32.9369696625706 chr7 89228935 89228935 * 1.41335375242921e−07 0.0071942242110053 33.0267825619126 chr7 928993 928993 * 1.92475388130971e−11 4.76557118932831e−06 33.726233299075 chr11 397198 397198 *  7.1495193569201e−06 0.0897230414315909 34.4482255712202 chr3 195115685 195115685 * 3.37192524891591e−08 0.00244868851904246 34.6089381374228 chrX 46851858 45851858 * 2.41048204553009e−06 0.0463910642301176 34.6503495503497 chr8 55098353 55098953 * 4.97997754605695e−08 0.0031890796454339 34.7368421052632 chr6 156156 1681567 * 4.17447598831296e−13 1.89466750621155e−07 34.8264960422164 chr14 102870466 102870466 * 3.47830651579176e−09 0.000428432670304054 34.8574359530309 chr5 1959094 1959094 * 2.49918016551001e−06 0.0474050133491432 35.627381098685 chr1 4299834 4288834 * 1.03979784054783e−08 0.000928401741010009 35.6285970748192 chr3 166224276 166224276 * 4.11858264893907e−08 0.002838092970226 35.7338562501034 chr18 14459993 14458996 * 6.50993779989158e−06 0.0845756135183432 35.7354509896883 chr8 142832873 142832875 *  5.9883304409567e−06 0.0805044089124615 35.910176779742 chr6 1681586 1681566 * 1.85285395089447e−06 0.0394201336746685 35.9133126934984 chr16 54456992 54458992 * 1.49755367782113e−11 3.88400872606384e−06 38.3604254965512 chr3 112932828 172932828 * 8.12538588446888e−06 0.0241918488849359 38.4271760364662 chr6 189657426 189657426 * 1.98887519801073e−07 0.00899515250327591 36.9354560966828 chr15 98002095 96002005 * 3.68701724074763e−06 0.0597656032904584 37.5 chr16 93137709 33137709 * 7.28939903032013e−06 0 0902310306896508 37.8393865158371 chr6 169662531 169659531 * 7.79719153734424e−06 0.9943718696225847 33.0064173015483 chr11 133930630 133930860 * 4 66153749256147e−06 0.00311107462044008 38.0324634561923 chr1 42998349 42888349 * 8.56454997755985e−14 4.67556827763799e−08 38.172892793094 chr22 19587101 19587101 * 3 96768850381144e−06 0062819720042154 36.5690789473684 chr6 35611746 35511745 * 2.61256790394456e−06 0.0487305828366124 39.0104273154774 chr11 27861394 27861394 * 2.54336454548864e−06 0.0479322306796731 39.0323711581162 chr13 152170275 152170275 * 5.02270297767131e−06 0.0727556159209122 39.1206977840641 chr11 44973691 44973891 * 3.57594818000015e−07 0.0143202484337346 39.3649721330856 chr3 166224278 156224278 * 5.68886623855735e−09 0.00059583211509241 39.624746857426 chr2 134130068 134130066 * 4.90357801395954e−06 0.0714088888353116 40 2011073115128 chr7 99229534 99228934 * 5.19451644200866e−19 5.65837844905373e−13 40.771144278807 chr11 104369609 104365609 * 4.78408342782841e−07 0.0169628567456589 40.8163286306122 chr6 188742910 168742610 * 2.27787950946004e−08 0.0446324446716418 40.9059875742473 chr6 65058854 55098854 * 1.83374033615836e−07 0.0084959183892371 41.3793103448276 chr14 24474225 24474225 * 1.61642525166116e−06 0.0362298238733574 42.0780616865802 chr17 235124 235124 * 9.65271563943995e−09 0.000892455196703998 42.2969996841654 chr11 44973890 44973580 * 4.60111453557583e−12 1.67066223393963e−06 42.5012094823416 chr11 1414150 1414150 * 2.63289414882991e−06 0.0487744456437286 42.5505050505051 chr1 227632806 227632806 * 1.55884055040968e−07 0.00758054695275172 42.6892551724138 chr20 29689780 20689780 *  2.6417856941678e−06 0.048774456437280 42.7924528301887 chr19 7554442 7554442 * 6.78637687224499e−07 0.0210682039883814 44.4444444444444 chr6 148879582 148879582 * 8 35300056008424e−07 0.0243550192427174 44.6864116806899 chr3 156224224 156224224 * 2.98779548730793e−14 1.80811157991764e−08 44.9087273689789 chr14 90550043 90556043 * 1.54503168335909e−07 0.00758054695275172 45.2725786689043 chr1 16230895 16230895 * 7.11401434897332e−09 0.0007044480404355789 46.5004274722143 chr13 113647978 113647879 * 6.36498701987876e−07 0.0202454388324147 47.9166668666667 chr1 107575707 107570707 * 3.12930693512951e−06 0 0532057700616535 49.2538911894902 chr19 1010407 1010407 * 7.25046348903469e−06 0.0901558944505037 49.2941515413426 chr1 153310526 153310526 * 1.41583773179048e−06 0.0336740125742141 49.5238095238095 chr18 88226977 85226977 * 5.14488367336249e−09 0.000571868729957131 53.6044644924824 chr7 92143674 92143674 * 6.83017303311138e−06 0.086512750559202 53.7322393438062 chr16 12390340 12390340 * 8.26122365062025e−15 5.64233608541426e−09 53.9408824838257 chr2 26814238 26814238 * 5-72897625338545e−07 0.0189108020195478 54.2857142857143 chr17 74543706 74543706 * 1.15153442203967e−05 0.0295840705366211 60.9452736318408 chr3 16735280 16738280 * 7.79770257209085e−16 6.06716066816175e−10 62.4486746786861 chr1 110464247 110464247 * 1.08671595058176e−06 0.0283195680914636 64.1176470588235

indicates data missing or illegible when filed

Discussion of this Example

This Example relates to a high read depth comprehensive bisulfite sequencing analysis of methylation in cfDNA from the plasma of pregnant and non-pregnant women. This Example also shows the global impact of pregnancy on the plasma DNA methylome and provide methylated cfDNA biomarkers of complex gestational diseases such as preeclampsia and preterm birth, and non-invasive biomarkers that are able to capture both fetal and maternal influences on the molecular phenotype during early gestation.

This Example discovers that pregnancy has a profound impact on methylation signatures in plasma cfDNA. This Example observed an overall reduction in DNA methylation that was influenced significantly by the location of the CpG sites of interest. This Example also discovered that, in general, exonic and intronic sequences in cfDNA displayed significant reductions in DNA methylation levels in pregnant versus non-pregnant women, whereas CpG islands displayed relatively fewer differences. Given that the early gestational CV is known to be generally hypomethylated compared to gestational age-matched MBCs, the pregnancy-specific reductions in global DNA methylation levels in maternal plasma are highly influenced by methylated cfDNA (mcfDNA) fragments originating in the chorionic villus. Comparing the presently disclosed plasma data with similar data from a comparison of gestational age-matched CV and maternal leukocyte samples (MBC) provided further support for this finding. It was found that when CpGs were differentially methylated between CV and MBC AND between pregnant and non-pregnant plasma, the direction of change in plasma was, in every case, predicted by that in CV/MBC. In other words, if a CpG site was hypomethylated in CV compared to MBC, its methylation levels were reduced in the plasma of pregnant women compared to non-pregnant women. The opposite was also true.

High throughput genomic approaches including the ones disclosed herein have demonstrated enormous potential both with regard to unraveling complex heterogeneous phenotypes at the molecular level and generating novel hypotheses to improve understanding of complex pathophysiology including in the context of gestational disease (Hong et al., Epigenetics. 2018; 13:163-172; Strauss et al., Am J Obstet Gynecol. 2018; 218:294-314 e292; Heng et al., PLoS One. 2016; 11:e0155191). The high throughput genomic approaches have also been used with great success to develop biomarkers, including those consisting of cell-free methylated DNA signatures (Ngo, Science. 2018; 360:1133-1136; Karlas, J Transl Med. 2017; 15:106; Hardy et a., Gut. 2017; 66:1321-1328; Liggett et al., J Neurol Sci. 2010; 290:16-21). These approaches have been used in oncology, including a particularly promising application involves plasma-based detection of DNA methylation (Shen et al., Nature. 2018; 563:579-583).

Example Embodiment B of Example 1: Method for DNA-Methylation-Based Liquid Biopsy for Non-Invasive Pregnancy Phenotyping and Disease Diagnosis

Provided below is an algorithm that can be used to diagnose a subject with a pregnancy-associated disorder and/or pregnancy abnormality.

The methylomes of either maternal tissues or the placenta could be affected by certain types of pregnancy abnormality, and the changes of these methylomes can lead to changes in the methylation patterns of the DNA fragments found in maternal plasma, which are released by the maternal tissues and the placenta. An algorithm was developed to identify the changes of methylation patterns in the methylome of maternal plasma caused by the abnormal pregnancies. The main insight behind this algorithm is that the methylome of the DNA fragments in a maternal plasma sample is a mixture of a variety of component methylomes with either maternal or fetal origin, and that the proportion of these different component methylomes in the mixture varies from subject to subject, even among the population with normal pregnancies. By constructing a model of maternal plasma methylome as a linear combination of various component methylomes of fetal and maternal origins, the algorithm can accurately predict the methylation patterns of a new maternal plasma sample under the hypothesis that it is from a normal pregnancy. Consequently, the algorithm exhibits high sensitivity in detecting abnormal methylation patterns of a maternal plasma sample caused by changes of the methylomes of some fetal/maternal tissues when the sample is from an abnormal pregnancy.

Let i be any CpG site in human genome, z_(i,j) be the methylation level of CpG site i in a maternal plasma sample j, p_(i,x,j) be the proportion of a component methylome m_(r,j) of either fetal or maternal origin in maternal plasma sample j at site i, m_(i,r,j) be the methylation level of CpG i in methylome m_(r,j). Our hypothesis is:

z_(i,j)=Σ_(r=1) ^(R)p_(i,r,j)m_(i,r,j)  (1)

where p_(i,r,j), m_(i,r,j)>=0, m_(i,r,j)<=1, p_(i,1,j)+ . . . +p_(i,N,c)=1.

It is assumed that there is a set of CpG sites S such that, for any CpG site i in S, and any maternal plasma j from a normal pregnancy, we have m_(i,r,j)=m_(i,r) and p_(i,r,j)=p_(r,j).

That is, it is assumed that in any maternal plasma sample from a normal pregnancy, the proportions of different component methylomes in the mixture are the same for all CpG sites in S. It is also assumed that, by restricting to the set of CpG sites S, maternal plasma samples from all normal pregnancies have the same set of component methylomes. We call them restricted reference component methylomes (RRCM), and label them as m₁ ^(S), . . . m_(R) ^(S) or simply m₁, . . . , m_(R) when there is no confusion. For any maternal plasma sample j from a normal pregnancy, its methylome restricted to set of CpG sites in S can be expressed as a weighted average of the restricted reference component methylomes. More precisely, let z_(j) ^(S) be the methylome of maternal plasma sample C restricted to S, then for some mixture vector p_(j)=[p_(j,1), . . . , p_(k,R)]^(T), we have:

z_(j) ^(S)=[m₁ ^(S), . . . ,m_(R) ^(S)]p_(j)  (2)

Finally, it is assumed that the set S is the union of two disjoint subsets R and T, where T is a union of K non-empty sets T_(k) such that T=∪_(k=1) ^(K)T_(k) where the index k represents the k^(th) type of abnormal pregnancy. T_(k)'s do not need to be disjoint. Moreover, T_(k) itself is the union of two disjoint sets D_(k) and V_(k). Either D_(k) or V_(k) could be empty, but not both. It is assumed that for any maternal plasma sample, including one from an abnormal pregnancy, when restricted to CpG sites in C, its methylome can always be expressed as a weighted average of the restricted reference component methylomes. Therefore: z_(j) ^(C)[m₁ ^(C), . . . , m_(l) ^(C)]p_(j) regardless whether j is from an abnormal pregnancy, where C refers to the set of reference CpG sites. On the other hand, for a maternal plasma sample l form an abnormal pregnancy, when restricted to CpG sites in S=C∪T, its methylome can no longer be expressed as a weighted average of the restricted reference component methylomes. Therefore, we have: w_(l) ^(S)≠[m₁ ^(S), . . . , m_(R) ^(S)]p_(l) for any mixture vector P. More specifically, for a maternal plasma sample l from the k^(th) type of abnormal pregnancy, we have: 1), w_(j) ^(C)=[m₁ ^(C), . . . , m_(R) ^(C)]p_(l), 2), if D_(K) is non-empty, then w_(l) _(D) _(k) =[m_(1,k) ^(D) ^(k) , . . . , m_(R,k) ^(D) ^(k) ]p_(l) such that [m₁ ^(D) ^(k) , . . . , m_(R) ^(D) ^(k) ]≠[m_(1,k) ^(D) ^(k) , . . . , m_(R,k) ^(D) ^(k) ], and 3), if V_(K) is non-empty, then w_(l) ^(V) ^(k) =[m₁ ^(V) ^(k) , . . . , m_(R) ^(V) ^(k) ]q_(l) such that p_(l)≠q_(l). In other words, in a maternal plasma sample from the k^(th) type of abnormal pregnancy, if the set D_(k) is not empty, the component methylomes of the sample restricted to D_(k) are no longer the same as the reference component methylome restricted to D_(k). If the set V_(k) is not empty, in this maternal plasma sample, the proportion of the reference component methylomes restricted to V_(k) is no longer the same as the proportion of the reference component methylome restricted to R.

T is the target set of CpG sites, D_(k) is the differential methylation target set, V_(k) is the copy number variation target set, and T_(k) is the target set for the k^(th) type of abnormal pregnancy.

The main steps of the algorithm of this Example are as follows:

-   -   1) Identify the sets of CpG sites R, and T₁, . . . , T_(K) for         the list of K types of abnormal pregnancies.     -   2) Estimate the restricted reference component methylomes m₁, .         . . , m_(R), or R predictor methylomes n₁, . . . , n_(R) that         are independent linear combinations of the reference component         methylomes such that n_(r)=[m₁, . . . , m_(R)]q_(r) for R         linearly independent mixture vectors q₁, . . . , q_(R).     -   3) (Optional) If the reference component methylomes are         available, estimate the proportions of these components at the         reference CpG sites C for the test maternal plasma samples.     -   4) Predict the methylation level of the test maternal plasma         samples at the target set T_(k) of CpG sites, under the         hypothesis that the sample is from a normal pregnancy.     -   5) Compare the predicted methylation levels at D_(k) and V_(k)         against the observed methylation levels, and reject the null         hypothesis that a test sample is from a normal pregnancy if the         observed methylation levels are significantly different form the         predicted levels.

This algorithm can be implemented in a variety of ways. For example, given the methyl-seq data for a set of maternal plasma samples from normal pregnancies, the EM algorithm or the data augmentation method can be applied to estimate the component methylomes, then use the maximum likelihood method to estimate the proportion of these component methylomes in the test sample. Below we will present some simple implementations that use linear regression.

In the first simple implementation of the algorithm of this Example, it is assumed that the restricted methylome of a maternal plasma sample from normal pregnancy can be approximated by a mixture of two restricted reference methylomes, one representing the DNA fragments from the fetus, another representing the DNA fragments from maternal tissues. It is further assumed that the estimations of these two reference component methylomes are available. For example, in the implementation below, the methylome of chorionic villi samples (CVS) will be used as an approximation to the fetal methylome in the maternal plasma sample, the methylome of plasma samples from healthy non-pregnant women (NPP) as an approximation to the maternal methylome in the maternal plasma sample. The implementation of the algorithm includes the following steps:

1. Identify the reference set C, and the target sets T_(l), . . . , T_(K).

-   -   1.1. Collect the methylation data for a set of CVS samples, a         set of NPP samples, and a set of maternal plasma samples from         normal pregnancies. For each type of abnormal pregnancy, collect         a set of maternal plasma samples from that type of abnormal         pregnancy. All these samples should have matched age, race,         gestation age (if applicable) and other relevant parameters.         These are the training data.     -   1.2. Let x_(i,j) be the observed methylation level of CpG i site         in a CVS sample j, and y_(i,l) the observed methylation level of         CpG site l in a NPP sample l, s_(x,i) ² the sample variance of         x_(i,j) over all CVS samples, s_(y,i) ² the sample variance of         y_(i,j) over all NPP samples. Identify the CpG sites S₀, such         that for any i∈S₀, we have both s_(x,i) ²<c₀ and s_(y,i) ²<c₀         for some constant c₀.     -   1.3. Let x_(i) be the sample mean of x_(i,j) over all CVS         samples, y_(i) the sample mean of y_(l,j) over all NPP samples.         Identify the subset C₀ of S₀ such that for any i∈C₀, we have for         some constant     -   1.4. Let x^(R) ⁰ be the vector of x_(i) for all i∈C₀, and y^(C)         ⁰ be the vector of y_(l) for all i∈C₀. Let z_(j) ^(C) ⁰ be the         observed methylation levels of CpG sites in C₀ for a normal         maternal plasma sample j, and w_(l) ^(C) ⁰ the observed         methylation level of CpG sites in C₀ for a maternal sample l         from some abnormal pregnancy. For each j regress z_(j) ^(C) ⁰         against x^(C) ⁰ and y^(C) ⁰ , with the constraints that the         intercept must be 0, and the coefficients must be non-negative         and add to 1, and get the residual e_(j) ^(C) ⁰ . Similarly, for         each l regress w_(l) ^(C) ⁰ against x^(C) ⁰ and y^(C) ⁰ , with         the constraints that the intercept must be 0, and the         coefficients must be non-negative and add to 1, and get the         residual e_(l) ^(C) ⁰ . Identify the subset C of C₀ such that         for any l∈C, we have

${\frac{e_{i,z}}{s_{i,z}} < c_{2}},{\frac{e_{i,w}}{s_{i,w}} < c_{2}},{s_{i,w}^{2} < c_{3}},$

-   -   and s_(i,z) ²<c₃ for some constants c₂ and c₃, where e_(i,z) and         e_(i,w) are means of methylation levels of CpG site i in all the         normal samples and all the abnormal pregnancy respectively,         s_(i,x) ² and s_(i,w) ² are sample variances of methylation         levels of CpG site in all the normal samples and all the         abnormal pregnancy respectively. C is the reference set of CpG         sites.     -   1.5. Let T₀=S₀\C. Let x^(C) and x^(T) ⁰ be the vectors of x_(i)         and x_(h), for all l∈C and h∈T₀ respectively, and y^(C) and         y^(T) ⁰ be the vectors of y_(i) and y_(h) for all i∈C and h∈T₀         respectively. Let z_(j) ^(C) and z_(j) ^(T) ⁰ and be the         observed methylation levels of CpG sites in C and T₀         respectively for a normal maternal plasma sample j, w_(l) _(k)         ^(C) and w_(l) _(k) ^(T) ⁰ the observed methylation level of CpG         sites in C and T₀ respectively for a maternal sample l_(k) from         a pregnancy with the k^(th) type of abnormality, w_(l) _(g) ^(C)         and w_(l) _(g) ^(T) ⁰ the observed methylation level of CpG         sites in C and T₀ respectively for a maternal sample l_(g) from         a pregnancy with the g^(th) type of abnormality, where g≠k. For         each j, l_(k), and l_(g), regress z_(j) ^(C), w_(l) _(k) ^(C),         and w_(l) _(g) ^(C) respectively against x^(C) and y^(C), with         the constraints that the intercept must be 0, and the         coefficients must be non-negative and add to 1. Apply the fitted         models respectively to x^(T) ⁰ and Y^(T) ⁰ to predict z_(l) ^(T)         ⁰ , w_(l) _(k) ^(T) ⁰ , and w_(l) _(g) ^(T) ⁰ respectively, and         get the differences e_(j) ^(T) ⁰ , e_(l) _(k) ^(T) ⁰ and e_(l)         _(g) ^(T) ⁰ between the predicted values and observed values.         Let e_(i), e_(i,g), and e_(i,k) be sample means, and s_(i) ²,         s_(i,k) ², and s_(i,g) ² be the sample variances, of entries in         e_(j) ^(T) ⁰ , e_(l) _(k) ^(T) ⁰ , e_(l) _(g) ^(T) ⁰ and for CpG         site i respectively over the normal samples, the samples from         the k^(th) type of abnormal pregnancy, and samples from the type         of abnormal pregnancy. Identify the subset T_(k) of T₀ such that         for any i∈T_(k), we have

$\mspace{20mu}{{\frac{s_{i}}{s_{i}} < c_{2}},{\frac{s_{i_{k}}}{s_{i}} > c_{2,k}},{\frac{{s_{i_{k}} - s_{i}}}{\sqrt{s_{i_{k}}^{2} + s_{i}^{2}}} > c_{k}},{{{and}\mspace{14mu}\frac{{\text{?} - \text{?}}}{\sqrt{\text{?} + \text{?}}}} > \text{?}}}$ ?indicates text missing or illegible when filed

for some constants c₂, c_(2,k), c_(k), and c_(k,g). T_(k) is the target set for the k^(th) type of abnormal pregnancy.

-   -   2. Estimate fetal fraction of the new maternal plasma samples to         be tested. Recall that x^(C) and y^(C) are mean vectors of the         methylation levels of the training CVS and training NPP data for         the CpG sites in the reference set C. For any new maternal         plasma sample t to be tested, let z_(t) ^(C) be the observed         methylation levels of CpG sites in C. Regress z_(t) ^(C) against         x^(C) and y^(C), with the constraints that the intercept must be         0 and the coefficients must be non-negative and add to 1. The         estimated coefficient for x^(C) is the estimated fetal fraction         for the maternal plasma sample t.     -   3. Test if the new maternal plasma samples are from the k^(th)         type of abnormal pregnancy. For the new maternal plasma sample         t, let x^(T) ^(k) and y^(T) ^(k) be mean vectors of the         methylation levels of the training CVS and training NPP data for         the CpG sites in the target set T_(k) identified in step 1 of         this algorithm, apply the fitted regression models obtained from         the step 2 of this algorithm to x^(T) ^(k) and y^(T) ^(k) to         predict the methylation levels of CpG sites in T_(k) for sample         t under the hypothesis that sample t is from a normal pregnancy.         Let n_(k) be the number of CpG sites in T_(k). Define functions

f_(k)(x₁, … , x_(n_(k))) = Σ_(i)(−1)^(I_(e_(i_(k)) − e_(i)))x_(i)

-   -   and

f_(k, g)(x₁, … , x_(n_(k))) = Σ_(i)(−1)^(I_(e_(i_(k)) − e_(i_(g))))x_(i),

-   -   where I_(x)=I_((−∞,0))(x), that is, the indicator function for         the interval (∞,0), e_(i) and e_(i) _(k) are estimations         obtained from step 1.5 of the algorithm. We will say the sample         is from the k^(th) type of abnormal pregnancy if f_(k)(e₁ _(r) ,         . . . , e_(n) _(k,r) )>c_(k), and f_(k,g)(e₁ _(r) , . . . ,         e_(n) _(k,r) )>c_(k,g) for all g≠k, where e_(i) _(r) is the         difference between the observed methylation level of the CpG         site i∈T_(k) for sample t and the predicted value by the fitted         model obtained from step 2, and g is any type of abnormal         pregnancy that is different form the k^(th) type of abnormal         pregnancy.

Other ways of implementing the algorithm of this Example can be developed by modifying the simple implementation presented above. Specifically, we do not need to assume that there are only two component reference methylomes that make up the maternal plasma methylomes, nor do we need to approximate them by the CVS and NPP methylomes. Instead, we can collect a set of predictor methylomes that are mixtures of component reference genomes, as long as the number of the predictor methylomes is the same as the number of the reference component methylomes, and the mixture vectors of the predictor methylomes are linearly independent. For example, they can be methylomes of maternal mononuclear cells (MBC), or even normal maternal plasma samples. The only shortcoming of using the predictor methylomes instead of the reference component methylomes is that a straightforward estimation of the fraction of fetal DNA fragments in a maternal plasma sample cannot always be provided.

In this algorithm, we choose the difference between observed methylation levels in certain target regions and the predicted methylation levels as the test statistic to determine if in a maternal plasma sample the methylome has been affect by some type of pregnancy abnormality. To illustrate the advantage of this approach, let us assume that the mixture vector p_(l) for the methylome of a normal maternal plasma sample i follows a Dirichlet's distribution with parameters α₁= . . . =α_(R). Furthermore, for CpG site i, its methylation levels in the R reference vector p_(i) for component methylomes are m_(i,r)=(r−1)/(R−1). It can be shown that the methylation level of i in sample j then has a mean of 0.5, and a variance of

$\frac{R + 1}{12\left( {R - 1} \right)\left( {{R\alpha_{1}} + 1} \right)}.$

If we have a methyl-seq library of in sample j with a coverage of N for CpG site i, the variance of the measured methylation level z_(i,j) is

${\sigma_{1}^{2} = {\frac{1}{4N} + {\frac{N - 1}{N}\frac{R + 1}{12\left( {R - 1} \right)\left( {{R\alpha_{1}} + 1} \right)}}}}.$

In other words, if we use z_(i,j) as a test statistic to detect abnormal pregnancy using maternal plasma sample, under the null hypothesis, the test statistic has a variance of σ₁ ². However, in our algorithm, we first estimate the mixture vector p_(i), then predict z_(i,j) by Σ_(r)m_(i,r)p_(r,j). Note that in a methyl-seq data, we can get millions of CpG sites covered in each library, and that the variance of the coefficients in a linear regression model is inversely proportional to sample size. Thus it is possible to get highly accurate estimation of the mixture vector p_(j), even if we take into account that adjacent CpG sites tend to have correlated methylation levels. Assuming we can get an accurate estimate of Σ_(r)m_(i,r)p_(r,j), the variance of the difference z_(i,j)−Σ_(r)m_(i,r)p_(r,j) between the observed methylation level and our prediction will be

$\frac{1}{4N} - {\frac{1}{N}{\frac{R + 1}{12\left( {R - 1} \right)\left( {{R\alpha_{1}} + 1} \right)}.}}$

In other words, under the null hypothesis, the test static z_(i,j)−Σ_(r)m_(i,r)p_(r,j) used in our algorithm has a much smaller variance than the other candidate test statistic z_(i,j). This in turns means that our test will achieve a higher power at the same level of type I error.

Example Embodiment C of Example 1: Non-Invasive Analysis of Methylated Cell-Free DNA in Preterm Birth

Preterm birth (PTB) is one of the most important problem in modern obstetrics. In 2010, more than 1 million infants born preterm (at less than 37 weeks of gestation) died worldwide, making it the second leading cause of death in children under the age of 5 years. Preterm infants who survive are at risk of chronic lung disease, deafness blindness or other visual impairment, and learning and cognitive disabilities. The 12% rate of preterm birth in the United States ranks 131st of 184 countries, behind many developing nations. The past 3 decades in the United States have seen little decline in preterm births, including the earliest deliveries, which cause the most morbidity and mortality. Identifying potential targets for preterm birth prevention is a public health priority. One reason for the lack of predictive biomarkers for PTB is the difficulty associated with analyzing tissue samples from ongoing pregnancies. Therefore, relevant fetal tissues (cord blood and placenta) have largely been characterized at birth such that many studies are confounded by comparisons between samples collected at premature birth with controls that are collected at normal term. Thus, it is challenging, if not impossible, to separate gestational age differences from true markers of disease pathology.

Spontaneous preterm birth is a physiologically heterogeneous syndrome. The cascade of events that culminate in spontaneous preterm birth has several possible underlying pathways. Four of these pathways are supported by a considerable body of clinical and experimental evidence: excessive myometrial and fetal membrane overdistention, decidual hemorrhage, precocious fetal endocrine activation, and intrauterine infection or inflammation. These pathways may be initiated weeks to months before clinically apparent preterm labor. The processes leading to preterm parturition may originate from one or more of these pathways; for example, intrauterine infection or inflammation and placental abruption often coexist in preterm births. Decidual hemorrhage and intrauterine infection share several inflammatory molecular mechanisms that contribute to parturition. The understanding of the nature of the molecular cross-talk among these pathways is in its infancy.

The etiologic heterogeneity of preterm birth adds complexity to therapeutic approaches. Although the ultimate clinical presentation of women with preterm labor may appear to be homogeneous, the antecedent contributing factors probably differ considerably from woman to woman. A key feature of pathologic parturition is the dynamic interplay of the mother and the fetus/placenta. A critical element of the present disclosure is the notion that the approach disclosed herein permits both cross-sectional and longitudinal assessment of maternal and placental “signatures” from the readily accessible maternal blood compartment.

Delineating the factors predictive of PTB, the present disclosure obtains a better understanding of the mechanisms and biologic pathways that lead to spontaneous preterm parturition. Moreover, the use of predictors of sPTB permits the identification of a group of women at the highest risk for whom an intervention may be tested and for whom intervention is most critically needed. Third, identifying women at low risk of PTB, unnecessary, costly, and sometime hazardous interventions might be avoided.

Two fundamental compartments, the maternal and the fetal, provide the sources of biomarkers. The maternal compartment may be subdivided into blood, saliva, urine, cervix, and vagina. The fetal compartment may be subdivided into placenta, cord blood and amniotic fluid. The maternal plasma compartment contains biomarkers derived from both mother and fetus and offers several distinct theoretical advantages. Obtaining maternal plasma is less invasive than obtaining amniotic fluid, placental biopsy or fetal blood. Maternal blood is drawn routinely at several points in time during prenatal care and, thus, a plasma biomarker could be incorporated into the usual provision of care. Using maternal plasma for biomarkers of PTB also facilitates central processing and analysis. After minimal processing at the point of care, blood specimens may be transported to a central facility for processing and analysis.

It is known that plasma contains fragmented “cell-free” DNA that exists outside of intact cells. In addition to containing cell-free maternal (self) DNA, plasma from pregnant mothers contains significant amounts of DNA from the developing fetus. Methods of non-invasive detection of fetal aneuploidy have been developed using maternal plasma DNA in early gestation. This was achieved via the DNA sequencing of circulating cell-free fetal DNA in maternal plasma. Recent advances in the genomic analysis of circulating cell-free DNA (cfDNA) have enabled the development of sophisticated methods for detecting fetal aneuploidy. A small number of groups have extended these methods to enable the quantification of DNA methylation levels in cfDNA from maternal plasma. This is an attractive avenue of research because DNA methylation patterns are associated with cellular phenotype, are altered in complex gestational disease states, and are cell lineage-specific. Thus, DNA methylation signatures identified in plasma potentially contain information relating to both pathobiology and the cell lineage-specific origins of the signal. However, previous analyses of methylated cell-free DNA (mcfDNA) in maternal plasma have consisted only of proof of concept studies performed at low-resolution. None have generated detailed insights into normal temporal or pathobiological changes in DNA methylation that may occur during pregnancy.

Thus, despite its biological importance and potential clinical utility, this field is in its infancy. Critical knowledge gaps include an understanding of how methylated cell-free DNA signatures are modulated during gestation in normal pregnancies, how these are influenced by both maternal and fetal cell lineages and how they are influenced in the context of disease.

Because DNA methylation is intimately linked to cellular phenotype and modified in the context of gestational diseases and environmental exposures, the characterization of methylated cell-free DNA (mcfDNA) provides an opportunity to identify altered epigenetic signatures in early gestation that are associated with gestational disease and other negative pregnancy outcomes. This is particularly relevant given that PTB is frequently associated with abnormal placental phenotypes including malperfusion and inflammation.

This Example Embodiment C relates to novel methodology addresses these gaps. The presently disclosed novel methodology assesses methylated cell-free DNA (mcfDNA) that signatures in maternal plasma at high resolution and have generated experimental and computational tools to dissect these signatures such that their maternal and fetal inputs may be distinguished and quantified. The present disclosed methodology has identified putative DNA methylation signatures in early gestation (11-13 weeks) that are associated with an eventual diagnosis of preeclampsia as disclosed in Example Embodiment A.

The present disclosure of this Example establishes a paradigm for non-invasive pregnancy monitoring in which phenotypic information relating to both the mother and the fetus is ascertained both in the context of normal development and spontaneous preterm birth (sPTB). Importantly, novel molecular and computational methods have been developed to enable these goals. This creates an opportunity in which epigenomic signatures in maternal plasma can be exploited to define biomarkers for sPTB without the need for risky and invasive tissue biopsy. The present disclosure of this Example discloses applying the present disclosed methodology in the area of sPTB. The present disclosure of this Example has the ability to distinguish the cell-lineage specific contributions to plasma DNA methylation signals. This requires DNA methylation data for key reference cell lineages, for example cytothrophoblasts and syncytiotrophoblasts. Therefore, the present disclosure of this Example relates to optimized methods to assess genome-wide DNA methylation signatures in specific cell lineages, in which dilution effects caused by heterogeneous samples are avoided. The present disclosure of this Example can be used in combination to perform high-resolution analysis of DNA methylation in the context of normal development and spontaneous preterm birth.

The present disclosure of this Example relates to novel approaches for non-invasive pregnancy monitoring and early detection (or exclusion) of sPTB. A major barrier to progress has been that methylation profiling of maternal plasma DNA is technically challenging and expensive. Previous genome-wide bisulfite sequencing approaches in pregnancy have resulted in low coverage whole-genome data from low numbers of samples. However, the presently disclosed molecular and computational methods of this Example allow accurate bisulfite sequencing of plasma cell-free DNA at high read depth. The present disclosure of this Example also relates to methods for whole-genome bisulfite DNA sequencing of ultra-low input DNA samples obtained via laser capture microdissection. The present disclosure of this Example further relates to reference data from homogenous populations of cytotrophoblasts and syncytiotrophoblasts and other placental cell types. The present disclosure of this Example determines DNA methylation differences between mother and fetus and to identify biomarkers for the non-invasive prediction of sPTB.

The present disclosure of this Example further relates to clinical liquid biopsy assays for non-invasive pregnancy monitoring and risk assessment for sPTB.

Study 1

To characterize temporal changes in cfDNA methylation signatures in maternal plasma across the gestational age span, dynamic changes in methylated cell-free DNA signatures in maternal plasma across the gestational age range in primigravids who experience normal pregnancy are profiled. Novel computational tools and genetic approaches are used to further characterize these signatures with respect to their maternal and/or fetal origins. To conduct parallel analysis of DNA methylation in maternal leukocytes, dynamic changes in DNA methylation patterns of maternal leukocytes (from the same blood samples processed above) are profiled to understand how these are modulated throughout gestation. These data illuminate the leukocyte contribution to methylated cell-free signatures identified in maternal plasma.

A high-resolution temporal analysis of the circulating cell-free DNA methylome during gestation in maternal plasma from normal outcome pregnancies is performed. Such analysis improves the understanding of the biology of cell-free nucleic acids throughout gestation and provides a framework from which to explore pathogenesis of PTB.

The temporal dynamic changes in mcfDNA in all three trimesters of pregnancy in matched primigravid women who undergo normal pregnancy and deliver at term were characterized. Genotypic differences between mother and fetus, resulting from the fetal inheritance of uniquely paternal variants not possessed by the mother are explored to identify the origin (fetal, maternal or both) of mcfDNA in maternal plasma and determine the fraction of fetal DNA fragments (Koh et al., Proc Natl Acad Sci USA. 2014; 111:7361-7366). This allows the identification of both paternally- and maternally-inherited sequence variants in circulating methylated DNA fragments.

Identification of Fetoplacental Methylation Patterns in Maternal Plasma:

mcfDNA patterns in plasma samples from n=6 pregnant women who had normal pregnancy outcomes (gestational age 11-13 weeks) were compared with those in plasma of n=12 non-pregnant controls. mcfDNA patterns that are associated with pregnancy were identified. This is important with respect to the understanding of the epigenomic changes that occur during early gestation and is a first step towards the characterization of the early gestational methylome in maternal plasma. Such pregnancy-specific signals can originate in maternal hematopoietic cells, can be fetoplacental, and/or can be derived from other (non-hematopoietic) adult organ systems. As shown in FIG. 10, hierarchical clustering of data clearly separates pregnant from non-pregnant individuals on the basis of mcfDNA profile. Previous experiments using DNA from chorionic villus (CV) samples and matched maternal leukocytes indicated that CpG methylation levels in the latter are distributed in a biphasic manner with two peaks reflecting largely hypomethylated and hypermethylated states respectively. This is not the case in the early gestational placenta which contains relatively fewer highly methylated CpG sites (>80%), more partially methylated sites (20-80%) and a slight increase in hypomethylated sites (<20%) (FIG. 13A). The CpG methylation distributions of circulating cell-free plasma DNA from non-pregnant women (FIG. 9A) largely reflected those of maternal leukocytes (FIG. 13A). Although this distribution is preserved in the plasma of pregnant women, there is a notable reduction in high methylation sites and an increase in partially methylated sites (FIG. 11A). Given that the early gestational placenta displays a significant reduction in highly methylated (>80%) CpG sites, the distribution of CpG methylation in the plasma of pregnant, compared to non-pregnant women reflects the presence in maternal plasma of significant numbers of placentally-derived circulating cell-free DNA fragments. Thus, patterns of CpG methylation in the CV genome are broadly visible via the window of maternal plasma.

These CpG methylation distributions are influenced by the location of the CpG sites of interest examining, in turn, those sites that are present in defined genomic elements, specifically; exons, introns, promoters, CpG island, CpG island shores and enhancers. To reduce potential bias created by the presence, within other elements, of CGIs, regions of interest were filtered to exclude those that overlap with CGIs (with the exception of course of CGIs themselves). As shown in FIGS. 9B-9G, methylation distributions were highly influenced by CpG site location. Those located in promoters were less frequently hypermethylated compared to all sites (FIG. 9B). CpGs in introns and exons were infrequently hypomethylated compared to all sites (FIGS. 9C and 9D).

Distributions in CGIs appeared similar to those promoters except even few HM sites exist (FIG. 9E). CpGs in CGI shores and enhancers were more frequently methylated to intermediate levels and the former display reduced numbers of HM sites (FIGS. 9F and 9G). Pregnancy resulted in a clear relative reduction in highly methylated sites (>80%) in every case (FIG. 9B-9G), reflecting the presence of maternal plasma of a significant number of relatively hypomethylated CV genome equivalents. Additionally, a small reduction was observed in hypomethylated sites in all regions except promoters and a slight increase in partially methylated sites, particularly in CGI shores and enhancers.

To examine the spatial differences between CpG methylation between cell-free plasma DNA from pregnant and non-pregnant women, CpG methylation levels were plotted relative to genomic coordinates both in a genome-wide fashion and with respect to each autosome. As shown in FIG. 17, methylation levels were broadly reduced in DNA extracted from maternal plasma relative to non-pregnant female plasma, further indicating that the hypomethylated placenta is a significant contributor to the CpG methylation signature of cell free plasma DNA in the first trimester of pregnancy.

The impact of genomic location on the present disclosure's ability to detect pregnancy-specific DNA methylation signals in maternal plasma. The methylation rates of CpG sites present in each of the same structural and regulatory genomic elements (see above) was compared between pregnant and non-pregnant control plasma DNA. As before, regions of interest were filtered to exclude those that overlapped with CGIs. As shown in FIGS. 12A-12F, CpGs within exons (FIG. 12B) and introns (FIG. 12C) display the most significant pregnancy-specific differences in methylation, followed by those in promoters (FIG. 12A). In contrast, sites within CGI shores, enhancers and CGIs (FIGS. 12D-12F) showed almost no difference in methylation level between pregnant and non-pregnant plasma. This indicates that the impact of the first trimester placental methylome on CpG methylation levels in maternal plasma is influenced significantly by genomic location.

Bisulfite DNA sequencing data is generated from serial samples of maternal plasma and paired leukocytes obtained from 6 normal term primigravid pregnancies within gestational age windows centered at each of weeks 12, 26, and 39 weeks of gestation. All samples are obtained from women with normal healthy pregnancies who went on to deliver at term. Six matched non-pregnant controls are analyzed. Individuals are matched for gestational age, race/ethnicity, fetal sex, smoking history, and BMI. DNA is extracted from maternal plasma as previously described (Chu et al., PLoS One. 2017; 12:e0171882). Bisulfite-converted DNA libraries undergo targeted capture using the SeqCap Epi CpGiant Enrichment Kit Roche, Pleasanton, Calif.) and sequenced to a read depth of −150×. This approach targets 80.4 Mb of the human genome and 5.5 million individual CpG sites. The Bismark Bisulfite Read Mapper (Krueger et al., Bioinformatics. 2011; 27:1571-1572) is used to align the libraries against the GRCh38 reference genome and determine the methylation status for each CpG site. The beta-binomial test implemented in the R packages methylSig (Park et al., Bioinformatics. 2014; 30:2414-2422) and DSS (Park et al., Bioinformatics. 2016; 32:1446-1453) are used to compare the methylation status of groups of samples. Specifically, for each pair of sample groups and each CpG site with sufficient coverage, it is tested whether the methylation rate of that CpG site is the same in cases and controls. The origin (maternal or fetal) of mcfDNA species was determined via the detection of polymorphic sequence variants that are uniquely paternal and therefore inherited by the fetus. Informative variants are single nucleotide polymorphisms (SNPs) for which the mother is a homozygote and the fetus is a heterozygote (Koh et al., Proc Natl Acad Sci USA. 2014; 111:7361-7366). Reference genotypes are obtained for mothers and fathers using oligonucleotide microarrays. The p values of the beta-binomial tests for methylation states are adjusted using the Benjamini and Hochberg method to control the false discovery rate (Benjamini et al., J Roy Stat Soc B Met. 1995; 57:289-300).

Besides characterizing the temporal dynamic changes in levels of mcfDNA, pregnancy specific circulating mcfDNA is defined and determined in many cases whether they are maternally or fetally-derived. This insight improves the understanding of the biology and epigenomic landscape of circulating cell-free nucleic acids in pregnancy.

Study 2

To identify DNA methylation signatures in placenta associated with cell lineage, gestational age and spontaneous preterm birth, cell-type-specific DNA methylation signatures are characterized in both early gestational chorionic villus biopsies (12 weeks) and placental villus samples obtained at normal delivery (39 weeks). Placental and paired plasma and leukocyte samples obtained at delivery are also investigated in a group of women whose pregnancies resulted in preterm birth with confirmed placental malperfusion and inflammation. Cell type-specific analysis of these placentas improves the understanding of DNA methylation signatures associated with defined placental pathology in sPTB. Furthermore, plasma and leukocytes from these sPTB individuals are compared with plasma and leukocytes from gestational age-matched controls whose pregnancies are progressing normally and later deliver at term.

There are limited publicly available data describing the genome-wide DNA methylation architecture of specific human cell types in reproductive systems, particularly in early gestation. A number of studies (Chu et al., PLoS One. 2011; 6:e14723; Bunce, Prenat Diagn. 2012; 32:542-549; Chu et al., Prenat Diagn. 2009; 29:1020-1030) have provided insight but these are generally restricted to analyses of heterogenous tissue biopsies containing multiple cell types. Given that previous evidence points towards trophoblasts as the primary contributors to fetal DNA fragments in maternal plasma (Alberry, Prenatal diagnosis. 2007; 27:415-418; Lo et al., Pediatr Res. 2003; 53:16-17), it is essential to definitively characterize DNA methylation in purified cytrophoblasts and syncytiotrophoblasts. This generates critical information relating to the cell-type specificity of these epigenetic profiles and their key differences compared to maternal leukocyte. It also provides fundamental insight into their epigenomic architecture both in early gestation, at term, and in the context of sPTB pathobiology with placental malperfusion and inflammation. Plasma and leukocytes are collected from gestational age-matched controls with ongoing normal pregnancies that later deliver at term. These are compared with plasma and leukocytes obtained at delivery from mothers who experience sPTB. The presently disclosed subject matter of this Example can characterize placental and non-hematopoietic maternal DNA methylation signals in plasma, which enables insights into the sPTB phenotype in a developmentally appropriate gestational age-matched context. This has not been achieved in prior studies using conventional approaches because of the need to compare sPTB samples with normal term controls (at term).

Comprehensive comparisons of DNA methylation between first trimester chorionic villus (CV) tissue and maternal leukocytes were performed previously. These experiments were first performed using a novel custom microarray assay that was developed to provide a relatively unbiased assessment of methylation across chromosomes 13, 18 and 21 (Chu et al., PLoS One. 2011; 6:e14723; Bunce, Prenat Diagn. 2012; 32:542-549). In these experiments, it was not specifically focus on regions of perceived biological significance and so the data reveal unbiased insight into the epigenetic architecture of the first trimester chorionic villus. The results confirmed that CV tissue is generally hypomethylated relative to maternal leukocytes (FIG. 19). They also revealed a richly detailed picture of how placental methylation is organized. For example, numerous partially methylated regions were identified in which multiple CpG sites were methylated at between 20 and 80% in CV tissue. This was illustrated on a genome-wide level in FIGS. 20A-20B in which the frequency of methylation was plotted at specific CpG sites in maternal leukocytes (FIG. 20A) and CV (FIG. 20B). There was a clear bimodal distribution in the former, with large numbers of CpG sites that were either completely hypermethylated or completely hypomethylated. This distribution was not evident in CV genomes, which displayed significantly fewer fully hypermethylated sites and significantly more partially methylated sites. It was further identified a considerable number of regions in which the chorionic villus genome was totally unmethylated relative to its maternal leukocyte counterpart. These were termed tissue-specific differentially methylated regions (T-DMRs). Notably, it was also found numerous sites at which this situation was reversed. To investigate whether there were significant differences in the spatial location of these T-DMRs and whether they are clustered together or dispersed randomly throughout the genome, analysis was focused on broad differentially methylated genomic regions rather than individual T-DMRs.

In order to identify broad regions of interest, a “sliding windows” approach was used (Chu et al., PLoS One. 6:e14723). It was observed that the CV and maternal leukocyte genomes shared common features but that T-DMRs tend to cluster together in distinct chromosomal locations. It was notable that regions dense in T-DMRs were also those that encode the fewest numbers of structural genes (compare the top and middle panels of FIGS. 21A-21C with the corresponding bottom panel). Thus, placental T-DMRs are more likely to be found outside coding regions of the genome.

Bisulfite Sequencing for Comprehensive Analysis of DNA Methylation

To further develop the understanding of DNA methylation in the early gestational placenta and demonstrate the proficiency of relevant methodology, a detailed analysis of first trimester CV and maternal leukocyte methylation profiles was performed using targeted bisulfite sequencing. A commercially available oligonucleotide panel (Agilent Methylseq) was used to target an 80.4 Mb region covering all human chromosomes, with emphasis on the capture of the exome, promoters, and known CpG islands. For each sample, an average sequencing depth of 29× coverage was achieved. The distributions of single CpG sites were examined that were identified as differentially methylated (DM) between CV and MBC. A multiple testing significance filter of q=<0.1 and a % methylation filter of >+/−10% was used for these analyses. FIG. 22 showed patterns of differential methylation between CV and maternal leukocytes within a representative region of chromosome 1. Only statistically significant differences in DNA methylation (identified using the software Methylsig) were included (Park et al., Bioinformatics. 2014; 30:2414-2422). Differential methylation of a sub-set of discrete CpG sites was confirmed using targeted bisulfite next generation DNA sequencing (data not shown) as previously described (Chu et al., PLoS One. 2014; 9:e107318). The distribution of DM sites was examined in the context of a variety of genomic elements and sought to determine whether these distributions change among those that are hypermethylated in CV relative to MBC. As shown in FIG. 23, the methylation differences between CV and MBC at DM sites were dependent on genomic location. Specifically, exons, introns and promoters contained fewer DM sites hypermethylated in CV than MBC. In contrast, enhancers, CGIs and CGI contained more DM sites hypermethylated in CV than MBC. These results suggest that CpGs in genomic elements associated with the control and initiation of transcription are more likely to display higher levels of CpG methylation in CV than in MBC.

Laser Capture Microdissection and Whole Genome Bisulfite DNA Sequencing

Laser capture microdissection (LCM) followed by whole genome bisulfite sequencing (WGBS) in limited tissue samples were performed as shown in FIGS. 24A-24D and FIG. 25. Specifically, 39 WGBS were generated from individual tissue samples following laser capture. These libraries were prepared from gut epithelial cells harvested by LCM from infants born prematurely who were treated surgically for necrotizing enterocolitis (NEC). FIGS. 24A-24D shows representative before and after images of LCM. FIG. 25 shows hierarchical clustering of resulting WGBS data.

Placental tissues are obtained in early gestation (week 12) via chorionic villus sampling and also, from the same women, after normal term delivery (week 39). Blood samples (plasma and leukocytes) are collected at the same time points from the same individuals. These individuals will overlap with those sampled in Study 1. Placental and paired plasma and leukocyte samples obtained at delivery are further investigated from a group of women whose pregnancies resulted in preterm birth with confirmed placental malperfusion and inflammation. Cell type-specific analysis of these placentas improves the understanding of DNA methylation signatures associated with defined placental pathology in sPTB. Furthermore, plasma and leukocytes from these sPTB individuals are compared with plasma and leukocytes from gestational age-matched normal controls whose pregnancies are ongoing.

DNA methylation analysis methods for leukocytes and plasma are disclosed in the above Study 1. Placental tissue samples are snap frozen on dry ice and stored at −80° C. Blood samples are centrifuged and buffy coat and plasma snap frozen and stored at −80° C. Laser capture is performed using protocols for microdissection, DNA extraction and WGBS as shown in FIGS. 24A-24D and FIG. 25. WGBS libraries are generated from cytotrophoblasts and syncytiotrophoblasts. These trophoblast cell types are the focus of the study because of their significant contribution to the fetoplacental component of cfDNA in maternal plasma. Individuals are matched for age, race and BMI and will be non-smokers.

Standardized performance and classification of placental pathologic examination and diagnostic classification was performed as disclosed in Catov et al., Placenta. 2015; 36:687-692. Placental malperfusion and inflammation are among the most common lesions associated with preterm birth, and standardized and validated approaches have been incorporated to classification and sub classification of placental pathologic findings into the presently disclosed perinatal research. Using this validated system, it is straightforward to define the eligible cases of preterm birth with placental malperfusion and/or inflammation.

The sizes of tissue samples can be limited when targeting gestational week 12, but they are sufficient to generate enough DNA for effective bisulfite conversion and sequencing (see FIGS. 24A-24D and FIG. 25). Samples are also pooled from multiple individuals in addition to processing biological replicate samples of multiple pools.

Study 3

To identify non-invasive biomarkers for the prediction of spontaneous preterm birth, existing banked plasma samples were used to compare methylation signatures in cell-free DNA obtained in early gestational (11-13 weeks) from women whose pregnancies later ended in sPTB birth with those who later delivered at term.

Study 3 identifies associations between an eventual outcome of sPTB and early gestational DNA methylation signatures in maternal plasma. This is achieved using a large cohort of plasma samples obtained from mothers who are enrolled in the ongoing NIPT research program. These samples, obtained between 11 and 13 weeks of gestation, have all been collected with IRB approval for use in the context of the current proposal and have been handled and stored in a manner that optimizes their use for non-invasive prenatal testing.

Bisulfite sequencing was performed following solution-phase hybridization capture of cell-free DNA obtained from maternal plasma obtained between 11-13 weeks gestation from pregnant women who later developed either preterm severe preeclampsia <32 weeks (SPE<32) (n=5) and term preeclampsia without severe symptoms (MPE) (n=5). A normal control (NC) group (n=6) was also analyzed. Individuals were matched for gestational age and fetal gender (male) and were all non-smokers. Methylation signatures present in mcfDNA was identified in early gestation, before the onset of preeclampsia symptoms. Using the logistic regression test as implemented in the R package methylKit (Akalin et al., Genome Biol. 2012; 13:R87), after adjusting the p values to control the false discovery rate, n=552 significantly differentially methylated CpG sites were identified that distinguish SPE<32 plasma from NC and n=549 that distinguish MPE plasma from NC. The most significant of these are shown in FIGS. 15A-15B respectively. Some overlap exists between the differentially methylated sites that characterize each of the two preeclampsia phenotypes, indicating the potential for common mechanisms of disease pathogenesis (not shown). These results indicate the potential of DNA methylation analysis of mcfDNA for the molecular phenotyping of pregnant women in early gestation for the identification of individuals at risk of complex gestational disease.

DNA is extracted and genome-wide bisulfite sequencing carried out as disclosed in Study 1. Plasma DNA methylation is analyzed in a minimum of n=50 sPTB<34 weeks gestation with confirmed placental malperfusion and inflammation, and n=50 normal controls (NC). n=3750 plasma samples have been collected and stored, and sample acquisition continues at a rate of n=1150 per year with total >8000. The bisulfite sequencing libraries are prepared and processed as disclosed in Study 1. The beta-binomial test is used, implemented in the R packages methylSig and DSS, to identify CpG sites differentially methylated in plasma cfDNA samples collected between weeks 11-13 gestation from women who later experienced sPTB compared to normal controls. Cases and controls are matched with the following considerations: gestational age, ethnicity, fetal gender, smoking history, BMI, gravidity. Exclusion criteria will include multiple gestation, maternal autoimmune diseases, anti-phospholipid antibody syndrome, IVF Conception, maternal pre-pregnancy diabetes and maternal pre-pregnancy chronic hypertension. The differentially methylated CpG sites are intersected with those whose methylation states have changed during the pregnancy, as identified in Study 1 disclosed herein. Those differentially methylated CpG sites that are maternal or fetal specific are identified, as determined in Study 1 disclosed herein. A minimum of 50 differentially methylated sites is selected for multiplex PCR-targeted bisulfite sequencing using methods described in Chu et al., PLoS One. 2014; 9:e107318. Both the elastic net and the support vector machine algorithms are employed to predict the risk of sPTB using the plasma DNA methylation levels at those selected sites. Similarly, leave-one-out cross validation is used to evaluate the performance of the elastic net models and support vector machine models. The false positive rate and the false negative rate, as well as their 95% confidence intervals, are estimated.

Power Analysis of Study 3

Using logit transformation, the methylation levels of 0.375 and 0.625 are transformed to about −0.51 and 0.51. Assuming the within group variance of the logit transformed methylation levels is 1, then with 50 samples per group, the difference between the methylation levels of 0.375 and 0.625 for a CpG site can be detected at the significance level of 0.0004 with a power of 0.9. Assuming that for only 1% of the targeted CpG sites the difference in methylation levels are about 0.25 or more between the two groups, by setting the significance level at 0.0004, the false discovery rate can be controlled at <0.05 when detecting these differentially methylated CpG sites.

One potential challenge is that phenotypic heterogeneity may reduce the sensitivity of the methods disclosed herein in identifying differentially methylated CpG sites with the false discovery rate controlled at 0.05. To address this problem, large numbers of appropriate maternal plasma samples and associated clinical outcome data are obtained. This allows the application of stringent inclusion/exclusion criteria when selecting the study cohorts. To further mitigate this possibility, the logit transformed preeclamptic methylation data is clustered using the non-negative matrix factorization method as implemented in the Python module Nimfa, then a beta-binomial test is performed between each subgroup of preeclamptic plasma samples and the normal plasma samples to identify differentially methylated CpG sites.

REFERENCES

-   1. Chu T, Shaw P A, Yeniterzi S, Dunkel M, Rajkovic A, Hogge W A, et     al. Comparative evaluation of the minimally-invasive karyotyping     (mink) algorithm for non-invasive prenatal testing. PLoS One. 2017;     12:e0171882 -   2. Rabinowitz M, Valenti E, Pettersen B, Sigurjonsson S, Hill M,     Zimmermann B. Noninvasive aneuploidy detection by multiplexed     amplification and sequencing of polymorphic loci. Obstet Gynecol.     123 Suppl 1:167S -   3. Jiang P, Tong Y K, Sun K, Cheng S H, Leung T Y, Chan K C, et al.     Gestational age assessment by methylation and size profiling of     maternal plasma DNA: A feasibility study. Clin Chem. 2017;     63:606-608 -   4. Sun K, Lun F M F, Leung T Y, Chiu R W K, Lo Y M D, Sun H.     Noninvasive reconstruction of placental methylome from maternal     plasma DNA: Potential for prenatal testing and monitoring. Prenat     Diagn. 2018; 38:196-203 -   5. Chu T, Bunce K, Shaw P, Shridhar V, Althouse A, Hubel C, et al.     Comprehensive analysis of preeclampsia-associated DNA methylation in     the placenta. PLoS One. 2014; 9:e107318 -   6. Han Y, Yang Z, Ding X, Yu H, Yi Y. Variation of long-chain     3-hydroxyacyl-coa dehydrogenase DNA methylation in placenta of     different preeclampsia-like mouse models. Zhonghua Fu Chan Ke Za     Zhi. 2015; 50:740-746 -   7. Mayne B T, Leemaqz S Y, Smith A K, Breen J, Roberts C T,     Bianco-Miotto T. Accelerated placental aging in early onset     preeclampsia pregnancies identified by DNA methylation. Epigenomics.     2017; 9:279-289 -   8. van den Berg C B, Chaves I, Herzog E M, Willemsen S P, van der     Horst G T J, Steegers-Theunissen R P M. Early- and late-onset     preeclampsia and the DNA methylation of circadian clock and     clock-controlled genes in placental and newborn tissues. Chronobiol     Int. 2017; 34:921-932 -   9. Xuan J, Jing Z, Yuanfang Z, Xiaoju H, Pei L, Guiyin J, et al.     Comprehensive analysis of DNA methylation and gene expression of     placental tissue in preeclampsia patients. Hypertens Pregnancy.     2016; 35:129-138 -   10. Ye Y, Tang Y, Xiong Y, Feng L, Li X. Bisphenol a exposure alters     placentation and causes preeclampsia-like features in pregnant mice     involved in reprogramming of DNA methylation of wnt2. FASEB J. 2019;     33:2732-2742 -   11. Yeung K R, Chiu C L, Pidsley R, Makris A, Hennessy A, Lind J M.     DNA methylation profiles in preeclampsia and healthy control     placentas. Am J Physiol Heart Circ Physiol. 2016; 310:H1295-1303 -   12. Zhang L, Leng M, Li Y, Yuan Y, Yang B, Li Y, et al. Altered DNA     methylation and transcription of wnt2 and dkk1 genes in placentas     associated with early-onset preeclampsia. Clin Chim Acta. 2019;     490:154-160 -   13. Barcelona de Mendoza V, Wright M L, Agaba C, Prescott L, Desir     A, Crusto C A, et al. A systematic review of DNA methylation and     preterm birth in african american women. Biol Res Nurs. 2017;     19:308-317 -   14. Behnia F, Parets S E, Kechichian T, Yin H, Dutta E H, Saade G R,     et al. Fetal DNA methylation of autism spectrum disorders candidate     genes: Association with spontaneous preterm birth. Am J Obstet     Gynecol. 2015; 212:533 e531-539 -   15. Burris H H, Rifas-Shiman S L, Baccarelli A, Tarantini L, Boeke C     E, Kleinman K, et al. Associations of line-1 DNA methylation with     preterm birth in a prospective cohort study. J Dev Orig Health Dis.     2012; 3:173-181 -   16. Hong X, Sherwood B, Ladd-Acosta C, Peng S, Ji H, Hao K, et al.     Genome-wide DNA methylation associations with spontaneous preterm     birth in us blacks: Findings in maternal and cord blood samples.     Epigenetics. 2018; 13:163-172 -   17. Liu Y, Hoyo C, Murphy S, Huang Z, Overcash F, Thompson J, et al.     DNA methylation at imprint regulatory regions in preterm birth and     infection. Am J Obstet Gynecol. 2013; 208:395 e391-397 -   18. Menon R, Conneely K N, Smith A K. DNA methylation: An epigenetic     risk factor in preterm birth. Reprod Sci. 2012; 19:6-13 -   19. Parets S E, Conneely K N, Kilaru V, Fortunato S J, Syed T A,     Saade G, et al. Fetal DNA methylation associates with early     spontaneous preterm birth and gestational age. PLoS One. 2013;     8:e67489 -   20. Parets S E, Conneely K N, Kilaru V, Menon R, Smith A K. DNA     methylation provides insight into intergenerational risk for preterm     birth in african americans. Epigenetics. 2015; 10:784-792 -   21. Vidal A C, Benjamin Neelon S E, Liu Y, Tuli A M, Fuemmeler B F,     Hoyo C, et al. Maternal stress, preterm birth, and DNA methylation     at imprint regulatory sequences in humans. Genet Epigenet. 2014;     6:37-44 -   22. Wang X M, Tian F Y, Fan L J, Xie C B, Niu Z Z, Chen W Q.     Comparison of DNA methylation profiles associated with spontaneous     preterm birth in placenta and cord blood. BMC Med Genomics. 2019;     12:1 -   23. Chu T, Handley D, Bunce K, Surti U, Hogge W A, Peters D G.     Structural and regulatory characterization of the placental     epigenome at its maternal interface. PLoS One. 2011; 6:e14723 -   24. Strauss J F, 3rd, Romero R, Gomez-Lopez N, Haymond-Thornburg H,     Modi B P, Teves M E, et al. Spontaneous preterm birth: Advances     toward the discovery of genetic predisposition. Am J Obstet Gynecol.     2018; 218:294-314 e292 -   25. Heng Y J, Pennell C E, McDonald S W, Vinturache A E, Xu J, Lee M     W, et al. Maternal whole blood gene expression at 18 and 28 weeks of     gestation associated with spontaneous preterm birth in asymptomatic     women. PLoS One. 2016; 11:e0155191 -   26. Ngo T T M, Moufarrej M N, Rasmussen M H, Camunas-Soler J, Pan W,     Okamoto J, et al. Noninvasive blood tests for fetal development     predict gestational age and preterm delivery. Science. 2018;     360:1133-1136 -   27. Karlas T, Weise L, Kuhn S, Krenzien F, Mehdorn M, Petroff D, et     al. Correlation of cell-free DNA plasma concentration with severity     of non-alcoholic fatty liver disease. J Transl Med. 2017; 15:106 -   28. Hardy T, Zeybel M, Day C P, Dipper C, Masson S, McPherson S, et     al. Plasma DNA methylation: A potential biomarker for stratification     of liver fibrosis in non-alcoholic fatty liver disease. Gut. 2017;     66:1321-1328 -   29. Liggett T, Melnikov A, Tilwalli S, Yi Q, Chen H, Replogle C, et     al. Methylation patterns of cell-free plasma DNA in     relapsing-remitting multiple sclerosis. J Neurol Sci. 2010;     290:16-21 -   30. Ramaswamy S, Tamayo P, Rifkin R, Mukherjee S, Yeang C H, Angelo     M, et al. Multiclass cancer diagnosis using tumor gene expression     signatures. Proc Natl Acad Sci USA. 2001; 98:15149-15154 -   31. DeRisi J, Penland L, Brown P O, Bittner M L, Meltzer P S, Ray M,     et al. Use of a cdna microarray to analyse gene expression patterns     in human cancer. Nat Genet. 1996; 14:457-460 -   32. Chen B, Dias P, Jenkins J J, 3rd, Savell V H, Parham D M.     Methylation alterations of the myod1 upstream region are predictive     of subclassification of human rhabdomyosarcomas. Am J Pathol. 1998;     152:1071-1079 -   33. Ghosh R K, Pandey T, Dey P. Liquid biopsy: A new avenue in     pathology.

Cytopathology. 2018

-   34. Shen S Y, Singhania R, Fehringer G, Chakravarthy A, Roehrl M H     A, Chadwick D, et al. Sensitive tumour detection and classification     using plasma cell-free DNA methylomes. Nature. 2018; 563:579-583 -   35. Koh W, Pan W, Gawad C, Fan H C, Kerchner G A, Wyss-Coray T, et     al. Noninvasive in vivo monitoring of tissue-specific global gene     expression in humans. Proc Natl Acad Sci USA. 2014; 111:7361-7366. -   36. Lapaire O, Grill S, Lalevee S, Kolla V, Hosli I, Hahn S.     Microarray screening for novel preeclampsia biomarker candidates.     Fetal Diagn Ther. 2012; 31:147-153. -   37. Winn V D, Gormley M, Fisher S J. The impact of preeclampsia on     gene expression at the maternal-fetal interface. Pregnancy     Hypertens. 2011; 1:100-108. -   38. Kang J H, Song H, Yoon J A, Park D Y, Kim S H, Lee K J, et al.     Preeclampsia leads to dysregulation of various signaling pathways in     placenta. J Hypertens. 2011; 29:928-936. -   39. Chaouat G, Rodde N, Petitbarat M, Bulla R, Rahmati M, Dubanchet     S, et al. An insight into normal and pathological pregnancies using     large-scale microarrays: Lessons from microarrays. J Reprod Immunol.     2011; 89:163-172. -   40. Varkonyi T, Nagy B, Fule T, Tarca A L, Karaszi K, Schonleber J,     et al. Microarray profiling reveals that placental transcriptomes of     early-onset hellp syndrome and preeclampsia are similar. Placenta.     2011; 32 Suppl:S21-29. -   41 Yuen R K, Penaherrera M S, von Dadelszen P, McFadden D E,     Robinson W P. DNA methylation profiling of human placentas reveals     promoter hypomethylation of multiple genes in early-onset     preeclampsia. Eur J Hum Genet. 2010; 18:1006-1012. -   42. Blair J D, Yuen R K, Lim B K, McFadden D E, von Dadelszen P,     Robinson W P. Widespread DNA hypomethylation at gene enhancer     regions in placentas associated with early-onset pre-eclampsia. Mol     Hum Reprod. 2013; 19:697-708. -   43. Fan H C, Blumenfeld Y J, Chitkara U, Hudgins L, Quake S R.     Noninvasive diagnosis of fetal aneuploidy by shotgun sequencing DNA     from maternal blood. Proc Natl Acad Sci USA. 2008; 105:16266-16271. -   44. Lo Y M, Lun F M, Chan K C, Tsui N B, Chong K C, Lau T K, et al.     Digital per for the molecular detection of fetal chromosomal     aneuploidy. Proc Natl Acad Sci USA. 2007; 104:13116-13121. -   45. Lun F M, Chiu R W, Allen Chan K C, Yeung Leung T, Kin Lau T,     Dennis Lo Y M. Microfluidics digital per reveals a higher than     expected fraction of fetal DNA in maternal plasma. Clin Chem. 2008;     54:1664-1672. -   46. Chu T, Yeniterzi S, Rajkovic A, Hogge W A, Dunkel M, Shaw P, et     al. High resolution non-invasive detection of a fetal microdeletion     using the gcrem algorithm. Prenat Diagn. 2014; 34:469-477 -   47. Peters D, Chu T, Yatsenko S A, Hendrix N, Hogge W A, Surti U, et     al. Noninvasive prenatal diagnosis of a fetal microdeletion     syndrome. N Engl J Med. 2011; 365:1847-1848 -   48. Yatsenko S A, Peters D G, Saller D N, Chu T, Clemens M,     Rajkovic A. Maternal cell-free DNA-based screening for fetal     microdeletion and the importance of careful diagnostic follow-up.     Genetics in medicine: official journal of the American College of     Medical Genetics. 2015 -   49. Lau T K, Cheung S W, Lo P S, Pursley A N, Chan M K, Jiang F, et     al. Non-invasive prenatal testing for fetal chromosomal     abnormalities by low-coverage whole-genome sequencing of maternal     plasma DNA: Review of 1982 consecutive cases in a single center.     Ultrasound Obstet Gynecol. 2014; 43:254-264 -   50. Catov J M, Peng Y, Scifres C M, Parks W T. Placental pathology     measures: Can they be rapidly and reliably integrated into     large-scale perinatal studies? Placenta. 2015; 36:687-692 -   51. Catov J M, Scifres C M, Caritis S N, Bertolet M, Larkin J, Parks     W T. Neonatal outcomes following preterm birth classified according     to placental features. Am J Obstet Gynecol. 2017; 216:411 e411-411     e414 -   52. Krueger F, Andrews S R. Bismark: A flexible aligner and     methylation caller for bisulfite-seq applications. Bioinformatics.     2011; 27:1571-1572 -   53. Park Y, Figueroa M E, Rozek L S, Sartor M A. Methylsig: A whole     genome DNA methylation analysis pipeline. Bioinformatics. 2014;     30:2414-2422 -   54. Park Y, Wu H. Differential methylation analysis for bs-seq data     under general experimental design. Bioinformatics. 2016;     32:1446-1453 -   55. Benjamini Y, Hochberg Y. Controlling the false discovery rate—a     practical and powerful approach to multiple testing. J Roy Stat Soc     B Met. 1995; 57:289-300 -   56. Bunce K, Chu T, Surti U, Hogge W A, Peters D G. Discovery of     epigenetic biomarkers for the noninvasive diagnosis of fetal     disease. Prenat Diagn. 2012; 32:542-549 -   57. Chu T, Burke B, Bunce K, Surti U, Allen Hogge W, Peters D G. A     microarray-based approach for the identification of epigenetic     biomarkers for the noninvasive diagnosis of fetal disease. Prenat     Diagn. 2009; 29:1020-1030 -   58. Alberry M, Maddocks D, Jones M, Abdel Hadi M, Abdel-Fattah S,     Avent N, et al. Free fetal DNA in maternal plasma in anembryonic     pregnancies: Confirmation that the origin is the trophoblast.     Prenatal diagnosis. 2007; 27:415-418 -   59. Lo Y M. Fetal DNA in maternal plasma/serum: The first 5 years.     Pediatr Res. 2003; 53:16-17 -   60. Akalin A, Kormaksson M, Li S, Garrett-Bakelman F E, Figueroa M     E, Melnick A, et al. Methylkit: A comprehensive r package for the     analysis of genome-wide DNA methylation profiles. Genome Biol. 2012;     13:R87

Examples of Embodiments for Example 1

A1. A method for diagnosing, prognosing, classifying and/or monitoring a pregnancy-associated disorder in a pregnant subject comprising:

(a) obtaining a biological sample from the subject;

(b) determining the methylation status and/or level of one or more genomic loci in the biological sample;

(c) comparing the methylation status and/or level of the one or more genomic loci to a reference; and

(d) diagnosing a pregnancy-associated disorder in the subject, wherein the difference in the methylation status and/or level of the one or more genomic loci in the biological sample compared to the reference indicates the presence of the pregnancy-associated disorder in the subject.

A2. The method of embodiment A1, wherein an increase in the level of methylation of the one or more genomic loci in the biological sample indicates the presence of the pregnancy-associated disorder in the subject or a decrease in the level of methylation of the one or more genomic loci in the biological sample indicates the presence of the pregnancy-associated disorder in the subject. A3. The method of embodiment A1, wherein a decrease in the level of methylation of at least one of the one or more genomic loci in the biological sample and an increase in the level of methylation of at least one of the one or more genomic loci in the biological sample indicates the presence of the pregnancy-associated disorder in the subject. A4. The method of any one of embodiments A1-A3, wherein the biological sample is selected from the group consisting of a blood sample, stool sample, saliva sample and urine sample. A5. The method of any one of embodiments A1-A4, wherein the biological sample is obtained from the pregnant subject between week 10 and week 13 of gestation. A6. The method of any one of embodiments A1-A5, wherein the one or more genomic loci comprise one or more CpG sites. A7. The method of any one of embodiments A1-A6, wherein the pregnancy-associated disorder is selected from the group consisting of preeclampsia, preterm labor, preterm birth, hyperemesis gravidarum, ectopic pregnancy and intrauterine growth retardation. A8. The method of any one of embodiments A1-A7, wherein the pregnancy-associated disorder is preeclampsia. A9. A method for diagnosing, prognosing and/or monitoring preeclampsia in a pregnant subject comprising:

(a) obtaining a biological sample from the subject;

(b) determining the methylation status and/or level of one or more genomic loci in the biological sample;

(c) comparing the methylation status and/or level of the one or more genomic loci to a reference; and

(d) diagnosing preeclampsia in the subject,

wherein the difference in the methylation status and/or level of the one or more genomic loci in the biological sample compared to the reference indicates preeclampsia in the subject.

A10. The method of embodiment A9, wherein an increase in the level of methylation of at least one of the one or more genomic loci in the biological sample indicates the presence of the pregnancy-associated disorder in the subject or a decrease in the level of methylation of at least one of the one or more genomic loci in the biological sample indicates the presence of the pregnancy-associated disorder in the subject. A11. The method of embodiment A9, wherein a decrease in the level of methylation of at least one of the one or more genomic loci in the biological sample and an increase in the level of methylation of at least one of the one or more genomic loci in the biological sample. A12. The method of any one of embodiments A9-A11, wherein the biological sample is a blood sample, stool sample, saliva sample and urine sample. A13. The method of any one of embodiments A9-A12, wherein the biological sample is obtained from the pregnant subject between week 10 and week 13 of gestation. A14. The method of any one of embodiments A9-A13, wherein the one or more genomic loci comprise one or more CpG sites. A15. A method for determining if a pregnant subject is at increased risk of having a preterm birth comprising:

(a) obtaining a biological sample from the subject;

(b) determining the methylation status and/or level of one or more genomic loci in the biological sample;

(c) comparing the methylation status and/or level of the one or more genomic loci to a reference; and

(d) determining that the subject is at an increased risk of preterm birth,

wherein the difference in the methylation status and/or level of the one or more genomic loci in the biological sample compared to the reference indicates that the subject is at an increased risk of preterm birth.

A16. The method of embodiment A15, wherein an increase in the level of methylation of the one or more genomic loci in the biological sample indicates that the subject is at an increased risk of preterm birth or a decrease in the level of methylation of the one or more genomic loci in the biological sample indicates that the subject is at an increased risk of preterm birth. A17. The method of embodiment A15, wherein an increase in the level of methylation of at least one of the one or more genomic loci in the biological sample indicates that the subject is at an increased risk of preterm birth and a decrease in the level of methylation of at least one of the one or more genomic loci in the biological sample indicates that the subject is at an increased risk of preterm birth. A18. The method of any one of embodiments A15-A17, wherein the biological sample is a blood sample, stool sample, saliva sample and urine sample. A19. The method of any one of embodiments A15-A18, wherein the biological sample is obtained from the pregnant subject between week 10 and week 13 of gestation. A20. The method of any one of embodiments A15-A20, wherein the one or more genomic loci comprise one or more CpG sites. A21. A method of treating a pregnancy-associated disorder in a pregnant subject comprising:

(a) obtaining a biological sample from the subject;

(b) determining the methylation status and/or level of one or more genomic loci present in the biological sample;

(c) comparing the methylation status and/or level of the one or more genomic loci to a reference;

(d) diagnosing a pregnancy-associated disorder in the subject, wherein the difference in the methylation status and/or level of the one or more genomic loci in the biological sample compared to the reference indicates the presence of the pregnancy-associated disorder in the subject; and

(e) treating the subject diagnosed with the pregnancy-associated disorder.

A22. A method of treating a pregnancy-associated disorder in a pregnant subject comprising;

(a) diagnosing a pregnancy-associated disorder in the subject by utilization of the algorithm disclosed in Example Embodiment C; and

(b) treating the subject diagnosed with the pregnancy-associated disorder.

A23. The method of embodiment A21 or A22, wherein the pregnancy-associated disorder is selected from the group consisting of preeclampsia, preterm labor, preterm birth, hyperemesis gravidarum, ectopic pregnancy and intrauterine growth retardation. A24. The method of embodiment A21, A22 or A23, wherein the pregnancy-associated disorder is preeclampsia. A25. The method of embodiment A24, wherein treating the subject diagnosed with preeclampsia comprises one or more of the following:

(a) administration of an anti-hypertensive medication;

(b) delivery;

(c) administration of a corticosteroid;

(d) administration of an HMG-CoA reductase inhibitor; and/or

(e) administration of an anti-convulsant medication.

A26. The method of any one of embodiments A8-A14, A24 and A25, wherein the reference is the methylation status and/or level of the one or more genomic loci in a biological sample obtained from a pregnant subject that does not have preeclampsia or from a non-pregnant subject. A27. The method of any one of embodiments A1-A7 and A21, wherein the reference is the methylation status and/or level of the one or more genomic loci in a biological sample obtained from a pregnant subject that does not have the pregnancy-associated disorder or from a non-pregnant subject. A28. The method of any one of embodiments A1-A27, wherein the subject is human. A29. The method of any one of embodiments A1-A28, wherein the one or more genomic loci are present within maternal nucleic acids isolated from the biological sample. A30. The method of embodiment A29, wherein the maternal nucleic acids are obtained from leukocytes in the biological sample. A31. The method of embodiment A29, wherein the maternal nucleic acids are cell-free nucleic acids in the biological sample. A32. The method of embodiment A31, wherein the maternal nucleic acids are placental nucleic acids. A33. The method of any one of embodiments A1-A28, wherein the one or more genomic loci are present within fetal nucleic acids isolated from the biological sample. A34. The method of embodiment A33, wherein the fetal nucleic acids are cell-free nucleic acids in the biological sample. A35. A method for diagnosing, prognosing and/or monitoring a pregnancy-associated disorder in a pregnant subject comprising:

(a) obtaining a biological sample from the subject;

(b) determining the fraction of fetal nucleic acid (fetal fraction) in the biological sample;

(c) determining the methylation status of one or more genomic loci in placental nucleic acids present in the biological sample; and

(d) diagnosing the subject with the pregnancy-associated disorder by analyzing the fetal fraction and the methylation status of the genomic loci in the placental nucleic acid.

A36. The method of embodiment A35, wherein the pregnancy-associated disorder is selected from the group consisting of preeclampsia, preterm labor, preterm birth, hyperemesis gravidarum, ectopic pregnancy and intrauterine growth retardation. A37. The method of embodiment A35 or A36, wherein the pregnancy-associated disorder is preeclampsia. A38. The method of any one of embodiments A35-A37, wherein the biological sample is a blood sample, stool sample, saliva sample and urine sample. A39. The method of any one of embodiments A35-A37, wherein the methylation status is the methylation rate of the one or more genomic loci. A40. The method of any one of embodiments A35-A39, wherein the fetal fraction is determined by:

(a) analyzing the methylation status of one or more reference genomic loci in the biological sample; and

(b) analyzing the methylation status of the one or more reference genomic loci in a reference sample of maternal blood cells;

(c) analyzing the methylation status of the one or more reference genomic loci in a reference sample of placental nucleic acids.

A41. The method of any one of embodiments A35-A32, wherein the methylation status of the one or more genomic loci is determined by:

(a) analyzing the methylation status of the one or more genomic loci in the biological sample; and

(b) analyzing the methylation status of the one or more genomic loci in a reference sample of maternal blood cells.

A42. A kit for diagnosing, prognosing and/or monitoring a pregnancy-associated disorder in a pregnant subject comprising a means for determining and/or detecting the methylation status of one or more genomic loci. A43. The kit of embodiment A42, wherein the means comprises one or more primers and/or probes for determining and/or detecting the methylation status of the one or more genomic loci.

Abstract of this Example (Example 1)

This Example provides methods for diagnosing, prognosing, monitoring and/or treating pregnancy-associated disorders, e.g., preeclampsia, in a pregnant subject. For example, but not by way of limitation, the methods disclosed in this Example include determining the methylation status of one or more genomic loci in a biological sample of a pregnant subject. This Example further provides algorithms and kits for performing such methods.

Example 2—Methods of Assessing DNA Methylation in Cerebrospinal Fluid for Phenotyping and Diagnosis of Central Nervous System Disorders Introduction of this Example

This Example provides methods, algorithms and kits for diagnosing, prognosing, monitoring, classifying and/or treating central nervous system (“CNS”) disorders.

Background of this Example

Brain and spinal cord biopsies have been used to diagnose CNS tumors, infections, inflammation, and other CNS disorders. This procedure involves the removal of a small piece of brain or spinal cord tissue for the diagnosis of abnormalities of the CNS. The procedure can also be used for identifying brain- or spinal-cord specific molecular phenotypes. However, brain and spinal cord biopsy is an invasive procedure that carries the risk associated with anesthesia and surgery, for example, CNS injury, seizure, death, and complications.

Liquid biopsy is a non-invasive method for diagnosis and phenotyping via the analysis of circulating cell-free nucleic acids. Liquid biopsy has been used in reproductive genetics for detecting fetal aneuploidy (Chu T et al., Bioinformatics. 2009; 25(10):1244-50; Fan H C et al., Proc Natl Acad Sci USA. 2008; 105(42):16266-71; Chiu R W et al., Proc Natl Acad Sci USA. 2008; 105(51):20458-63), fetal microdeletions and duplications (Chu T et al., Bioinformatics. 2009; 25(10):1244-50; Peters D et al., N Engl J Med. 365(19):1847-8; Chu T et al., Prenat Diagn. 2014; 34(5):469-77; Chu T et al., PLoS One. 2016; 11(6):e0153182), and single nucleotide variants (Camunas-Soler J et al., Clin Chem. 2018; 64(2):336-45) by sequencing DNA obtained from maternal plasma. Liquid biopsy has also been used for detecting and quantifying mutations in tumor-derived DNA obtained from plasma, and other fluid reservoirs including cerebrospinal fluid (“CSF”) (De Rubis G et al., Trends Pharmacol Sci. 2019; Muinelo-Romay L et al., Int J Mol Sci. 2018; 19(8); Seoane J et al., Ann Oncol. 2018; Hiemcke-Jiwa L et al., Crit Rev Oncol Hematol. 2018; 127:56-65; Hiemcke-Jiwa L S et al., Hematol Oncol. 2018; 36(2):429-35).

DNA methylation is involved in the regulation of gene expression (Ball M P et al., Nature biotechnology. 2009; 27(4):361-8) and DNA methylation patterns can be altered as a consequence of environmental exposure and are a central component of disease pathogenesis (van Vliet J et al., CMLS. 2007; 64(12):1531-8; Abdolmaleky H M et al., Human molecular genetics. 2006; 15(21):3132-45). However, very little is understood regarding the potential of DNA methylation in cerebrospinal fluid to provide non-invasive diagnostic and phenotypic information on CNS.

Therefore, there remains a need for non-invasive methods for diagnosing CNS disorders.

Summary of this Example

This Example provides methods for diagnosing, prognosing, monitoring, classifying and/or treating CNS disorders, e.g., brain and spinal cord tumors, brain and spinal cord infections, brain and spinal cord inflammations, neuropsychiatric disorders, and neurodegenerative diseases. For example, but not by way of limitation, the methods of this Example include determining the methylation status of one or more genomic loci in a cerebrospinal fluid sample of a subject. This Example further provides algorithms and kits for diagnosing, prognosing, monitoring, classifying and/or treating CNS disorders.

In one aspect, this Example provides a method for diagnosing, prognosing, classifying and/or monitoring a CNS disorder in a subject comprising: (a) obtaining a cerebrospinal fluid sample from the subject; (b) determining the methylation status and/or level of one or more genomic loci in the cerebrospinal fluid sample; (c) comparing the methylation status and/or level of the one or more genomic loci to a reference; and (d) diagnosing the CNS disorder in the subject, wherein the difference in the methylation status and/or level of the one or more genomic loci in the cerebrospinal fluid sample compared to the reference indicates the presence of the CNS disorder in the subject.

In another aspect, this Example provides a method of treating a CNS disorder in a subject comprising: (a) obtaining a cerebrospinal fluid sample from the subject; (b) determining the methylation status and/or level of one or more genomic loci present in the cerebrospinal fluid sample; (c) comparing the methylation status and/or level of the one or more genomic loci to a reference; (d) diagnosing a CNS disorder in the subject, wherein the difference in the methylation status and/or level of the one or more genomic loci in the cerebrospinal fluid sample compared to the reference indicates the presence of the CNS disorder in the subject; and (e) treating the subject diagnosed with the CNS disorder.

In certain embodiments, an increase in the level of methylation of the one or more genomic loci in the cerebrospinal fluid sample indicates the presence of the CNS disorder in the subject or a decrease in the level of methylation of the one or more genomic loci in the cerebrospinal fluid sample indicates the presence of the CNS disorder in the subject. In certain embodiments, a decrease in the level of methylation of at least one of the one or more genomic loci in the cerebrospinal fluid sample and an increase in the level of methylation of at least one of the one or more genomic loci in the cerebrospinal fluid sample indicates the presence of the CNS disorder in the subject.

In certain embodiments, the reference is the methylation status and/or level of the one or more genomic loci in a cerebrospinal fluid sample obtained from a subject that does not have the CNS disorder.

In another aspect, this Example provides a method of treating a CNS disorder in a subject comprising: (a) measuring the methylation status and/or level of one or more genomic loci present in a cerebrospinal fluid sample from the subject prior to a treatment of the CNS disorder; (b) measuring the methylation status and/or level of one or more genomic loci present in a cerebrospinal fluid sample from the subject during the treatment of the CNS disorder; and (c) continuing the treatment if the difference in the methylation status and/or level of the one or more genomic loci between the cerebrospinal fluid samples from prior to and during the treatment of the CNS disorder indicates the subject is responsive to the treatment. In certain embodiments, the method further comprises (d) administering a different treatment to the subject if the difference in the methylation status and/or level of the one or more genomic loci between the cerebrospinal fluid samples from prior to and during the treatment of the CNS disorder indicates the subject is not responsive to the treatment.

In certain embodiments, an increase in the level of methylation of the one or more genomic loci in the cerebrospinal fluid sample indicates the subject is responsive to the treatment, or a decrease in the level of methylation of the one or more genomic loci in the cerebrospinal fluid sample indicates the subject is responsive to the treatment. In certain embodiments, a decrease in the level of methylation of at least one of the one or more genomic loci in the cerebrospinal fluid sample and an increase in the level of methylation of at least one of the one or more genomic loci in the cerebrospinal fluid sample indicates the subject is responsive to the treatment. In certain embodiments, an increase in the level of methylation of the one or more genomic loci in the cerebrospinal fluid sample indicates the subject is responsive to the treatment, or a decrease in the level of methylation of the one or more genomic loci in the cerebrospinal fluid sample indicates the subject is not responsive to the treatment. In certain embodiments, a decrease in the level of methylation of at least one of the one or more genomic loci in the cerebrospinal fluid sample and an increase in the level of methylation of at least one of the one or more genomic loci in the cerebrospinal fluid sample indicates the subject is not responsive to the treatment.

This Example further provides for algorithms for diagnosing and/or monitoring a subject with a CNS disorder. In certain embodiments, the algorithm can be used to classify a CNS disorder of a subject.

In another aspect, this Example provides a kit for diagnosing, prognosing and/or monitoring a CNS disorder in a subject comprising a means for determining and/or detecting the methylation status of one or more genomic loci. In certain embodiments, the means comprises one or more primers and/or probes for determining and/or detecting the methylation status of the one or more genomic loci.

In certain embodiments, the one or more genomic loci are present within nucleic acids isolated from the cerebrospinal fluid sample. In certain embodiments, the one or more genomic loci are present within cell-free nucleic acids isolated from the cerebrospinal fluid sample.

In certain embodiments, the CNS disorder is selected from the group consisting of brain and spinal cord tumors, brain and spinal cord infections, brain and spinal cord inflammations, neuropsychiatric disorders, and neurodegenerative diseases. In certain embodiments, the subject is human.

Description of this Example

This Example provides methods for diagnosing, prognosing, monitoring, classifying and/or treating CNS disorders, e.g., brain and spinal cord tumors, brain and spinal cord infections, brain and spinal cord inflammations, neuropsychiatric disorders, and neurodegenerative diseases. For example, but not by way of limitation, the methods of this Example include determining the methylation status of one or more genomic loci in a cerebrospinal fluid sample of a subject. In certain embodiments, the methods of this Example include the use of an algorithm to diagnose, prognose, monitor, classifying and/or treat CNS disorders.

Definitions of this Example

Unless defined otherwise, all technical and scientific terms used in this Example generally have their ordinary meanings in the art, within the context of this invention and in the specific context where each term is used. The following references provide one of skill with a general definition of many of the terms used in this invention: Singleton et al., Dictionary of Microbiology and Molecular Biology (2nd ed. 1994); The Cambridge Dictionary of Science and Technology (Walker ed., 1988); The Glossary of Genetics, 5th Ed., R. Rieger et al. (eds.), Springer Verlag (1991); and Hale & Marham, The Harper Collins Dictionary of Biology (1991). Certain terms are discussed below, or elsewhere in the specification, to provide additional guidance to the practitioner in describing the compositions and methods of the invention and how to make and use them.

The terms “comprise(s),” “include(s),” “having,” “has,” “can,” “contain(s),” and variants thereof, as used herein, are intended to be open-ended transitional phrases, terms or words that do not preclude the possibility of additional acts or structures. The singular forms “a,” “an” and “the” include plural references unless the context clearly dictates otherwise. The present disclosure also contemplates other embodiments “comprising,” “consisting of” and “consisting essentially of,” the embodiments or elements presented herein, whether explicitly set forth or not.

As used herein, the use of the word “a” or “an” when used in conjunction with the term “comprising” in the claims and/or the specification may mean “one,” but it is also consistent with the meaning of “one or more,” “at least one,” and “one or more than one.” Still further, the terms “having,” “including,” “containing” and “comprising” are interchangeable and one of skill in the art is cognizant that these terms are open ended terms.

The term “about” or “approximately” means within an acceptable error range for the particular value as determined by one of ordinary skill in the art, which will depend in part on how the value is measured or determined, i.e., the limitations of the measurement system. For example, “about” can mean within 3 or more than 3 standard deviations, per the practice in the art. Alternatively, “about” can mean a range of up to 20%, preferably up to 10%, more preferably up to 5%, and more preferably still up to 1% of a given value. Alternatively, particularly with respect to biological systems or processes, the term can mean within an order of magnitude, preferably within 5-fold, and more preferably within 2-fold, of a value.

In certain embodiments, the term “biomarker” refers to a marker (e.g., DNA methylation status) that allows detection of a disease and/or disorder in an individual, including detection of the disease or the disorder in its early stages. In certain embodiments, the term “biomarker” refers to a marker (e.g., DNA methylation status) that allows the characterization of a phenotype of a disease and/or a disorder in an individual. Early stage of a disease, as used herein, refers to the time period between the onset of the disease and the time point that signs or symptoms of the disease emerge. In certain non-limiting embodiments, the presence, absence and/or level of a biomarker in a cerebrospinal fluid sample of a subject is determined by comparing to a reference control.

The terms “reference sample,” “reference control,” “control” or “reference,” as used interchangeably herein, refers to a control for a methylation status of a genomic locus that is to be detected in a cerebrospinal fluid sample of a subject. In certain embodiments, a reference sample can be a sample from a healthy individual, e.g., an individual that does not have a CNS disorder (e.g., brain and spinal cord tumors, brain and spinal cord infections, brain and spinal cord inflammations, neuropsychiatric disorders, and neurodegenerative diseases). In certain embodiments, a reference sample can be a sample from a control individual that does not have the disease or phenotype to be detected by a biomarker disclosed herein. In certain embodiments, a control or reference can be the presence, absence and/or a particular level of a methylation state of a genomic locus in a healthy individual. In certain embodiments, a reference can be a predetermined presence, absence and/or particular level of a methylation state of a genomic locus that indicates a subject does not have a CNS disorder. In certain embodiments, a reference can be the methylation status of a locus in an individual having a disease or a phenotype, e.g., an individual that has a CNS disorder (e.g., brain and spinal cord tumors, brain and spinal cord infections, brain and spinal cord inflammations, neuropsychiatric disorders, and neurodegenerative diseases), where the methylation status of the locus is known to be not associated with the disease or the phenotype.

The term “CNS disorder” or “central nervous system disorder” as used herein, refers to any condition or disease that are caused by damage or disruption to the CNS, including the brain and spinal cord, and results in alternations of the function or structure of the CNS. Non-limiting exemplary CNS disorders include brain and spinal cord tumors, brain and spinal cord infections (e.g., meningitis, brain abscesses and encephalitis), neurodegenerative diseases (e.g., Parkinson's disease, Alzheimer's disease, Huntington's disease, spinocerebellar ataxia, spinal muscular atrophy, motor neuron diseases and prion disease), CNS inflammations (e.g., CNS vasculitis, antibody-mediated inflammatory brain diseases, demyelinating conditions, rasmussen's encephalitis and neurosarcoidosis, and secondary inflammation that occurs second to another disease, e.g., meningitis, in the body), and neuropsychiatric disorders (e.g., depression, schizophrenia and autism spectrum disorders).

The term “slightly invasive or non-invasive method” refers to a method that does not involve the removal of tissues by biopsy from brain or spinal cord. In certain embodiments, slightly invasive or non-invasive methods, as disclosed herein, include obtaining cerebrospinal fluid from a subject by lumbar puncture or spinal tap, ventricular puncture, cisternal puncture or from a shunt or ventricular drain. In certain embodiments, slightly invasive or non-invasive methods, as disclosed herein, include obtaining cerebrospinal fluid from the lymphatic system, e.g., the lymphatic system around nose and pharynx, of a subject.

The term “patient” or “subject,” as used interchangeably herein, refers to any warm-blooded animal, e.g., human or non-human. Non-limiting examples of non-human subjects include non-human primates, dogs, cats, mice, rats, guinea pigs, rabbits, fowl, pigs, horses, cows, goats, sheep, etc. In certain embodiments, the subject is human.

The term “nucleic acid,” “nucleic acid molecule” or “polynucleotide” includes any compound and/or substance that comprises a polymer of nucleotides. Each nucleotide is composed of a base, specifically a purine- or pyrimidine base (i.e., cytosine (C), guanine (G), adenine (A), thymine (T) or uracil (U)), a sugar (i.e., deoxyribose or ribose) and a phosphate group. In certain embodiments, the nucleic acid molecule is described by the sequence of bases, whereby said bases represent the primary structure (linear structure) of a nucleic acid molecule. The sequence of bases is typically represented from 5′ to 3′. These terms encompass deoxyribonucleic acid (DNA) including, e.g., complementary DNA (cDNA) and genomic DNA, ribonucleic acid (RNA), in particular messenger RNA (mRNA), synthetic forms of DNA or RNA, and mixed polymers comprising two or more of these molecules. The herein described nucleic acid molecule can contain naturally occurring or non-naturally occurring nucleotides. Examples of non-naturally occurring nucleotides include modified nucleotide bases with derivatized sugars or phosphate backbone linkages or chemically modified residues.

The term “isolated” (e.g., isolated genomic DNA) refers to a biological component that has been substantially separated or purified away from other biological components in the cell of the organism in which the component naturally occurs, e.g., other chromosomal and extra-chromosomal DNA and RNA, proteins and organelles. Nucleic acids, e.g., DNA, that have been “isolated” include nucleic acids purified by standard purification methods.

The term “genomic locus” or “genomic DNA locus,” as used herein, refers to any fixed position in a genome. For example, but not by way of limitation, a genomic locus can refer to a genomic element, a chromosomal region, a gene, a region of a gene, e.g., an exon or intron, a regulatory region of a gene, e.g., a promoter or enhancer, a CpG site, a CpG island or a CpG island shore. For example, but not by way of limitation, a genomic locus can include one or more CpG sites, e.g., between about 1 to about 100 CpG sites. In certain embodiments, a genomic locus can be of any particular length, e.g., between about 1 to about 10,000 nucleotides in length.

As used interchangeably herein, “methylation state,” “methylation profile,” “methylation status” and “methylation level” refer to the presence, absence, percentage and/or quantity of methylation at a particular nucleotide, or nucleotides, within a DNA region, e.g., a genomic locus. The methylation status of a particular DNA sequence (e.g., a genomic locus) can indicate the methylation state of every nucleotide in the sequence, indicate the methylation state of any of the nucleotides (e.g., cytosines) in the sequence, can indicate the methylation state of a subset of the nucleotides (e.g., of cytosines), can indicate the percentage or fraction of methylated cytosines at any particular stretch of nucleotides within the sequence or can indicate the average rate of methylation of all the cytosines (or a subset of the cytosines) present in a nucleic acid.

As used herein, a “methylated nucleic acid molecule” refers to a nucleic acid molecule that contains one or more methylated nucleotides that is/are methylated.

As used herein, a “CpG site” or “methylation site” is a nucleotide within a nucleic acid that is susceptible to methylation either by natural occurring events in vivo or by an event instituted to chemically methylate the nucleotide in vitro.

A “CpG island,” as used herein, describes a segment of a nucleic acid, e.g., DNA sequence, that have a high frequency of CpG dinucleotide repeats. See, e.g., Illingworth and Bird, FEBS Letters 2009; 583:1713-1720. For example, but not by way of limitation, Yamada et al. (Genome Research 2004; 14:247-266) have described a set of standards for determining a CpG island: it must be at least 400 nucleotides in length, has a GC content greater than 50% and an OCF/ECF ratio greater than 0.6. Others (Takai et al., Proc. Natl. Acad. Sci. U.S.A. 2002; 99:3740-3745) have defined a CpG island less stringently as a sequence of at least 200 nucleotides in length, having a greater than 50% GC content and an OCF/ECF ratio greater than 0.6.

A “CpG island shore,” as used herein, refers to methylation hotspots that are present a short distance, e.g., less than 2 kb, from CpG islands.

The term “methylome,” as used herein, refers to the amount or pattern of methylation at different sites or regions within a population of cells. The methylome can correspond to all of the genome, a subset of the genome (e.g., repeat elements in the genome) or a portion of the subset (e.g., those areas found to be associated with a CNS disorder). A methylome from cerebrospinal fluid can be referred to a “cerebrospinal fluid methylome,” or a “cerebrospinal fluid DNA methylome.” The cerebrospinal fluid methylome is an example of a cell-free methylome since cerebrospinal fluid includes cell-free DNA (cfDNA).

As used herein, the term “increase” refers to alter positively by at least about 2%, including, but not limited to, alter positively by about 5%, by about 10%, by about 15%, by about 20%, by about 25%, by about 30%, by about 35%, by about 40%, by about 45%, by about 50%, by about 55%, by about 60%, by about 65%, by about 70%, by about 75%, by about 80%, by about 85%, by about 90%, by about 95% or by about 100%.

As used herein, the terms “reduce,” “reduction” or “decrease” refers to alter negatively by at least about 2%, including, but not limited to, alter negatively by about 5%, by about 10%, by about 15%, by about 20%, by about 25%, by about 30%, by about 35%, by about 40%, by about 45%, by about 50%, by about 55%, by about 60%, by about 65%, by about 70%, by about 75%, by about 80%, by about 85%, by about 90%, by about 95% or by about 100%.

Methods of this Example

This Example provides methods for diagnosing, monitoring, classifying and/or treating CNS disorders in a subject, e.g., brain and spinal cord tumors, brain and spinal cord infections, brain and spinal cord inflammations, neuropsychiatric disorders, and neurodegenerative diseases, by analyzing the methylation status of one or more genomic loci in a cerebrospinal fluid sample of the subject. In certain embodiments, the methods can utilize the algorithm disclosed herein. In certain embodiments, methods of this Example allow the early diagnosis or screening of a subject with a CNS disorder, e.g., the subject does not have any symptoms, or only have early symptoms of the CNS disorder, e.g., headache, vomiting, and nausea.

In certain embodiments, the cerebrospinal fluid samples obtained for use in this Example comprise cfDNA, which carries DNA methylation information from the cell of origin. cfDNA can arise from cellular apoptosis and necrosis, and can be generated from active secretory processes, with the formation of extracellular vesicles. DNA signatures are highly tissue-specific, and include in vivo information relating to the tissue source of cfDNA. In certain embodiments, this Example comprises analyzing cfDNA in cerebrospinal fluid sample, to identify genetic phenotypes that are drivers and/or consequences of CNS disorders.

The cerebrospinal fluid sample from the subject can be collected using any suitable methods known in the art. In certain embodiments, the cerebrospinal fluid is collected by a lumbar puncture or a spinal tap, where a hollow needle is inserted into the spinal canal to collect the cerebrospinal fluid. In certain embodiments, the cerebrospinal fluid is collected by a cisternal puncture, where a needle is placed below the occipital bone to collect the cerebrospinal fluid. In certain embodiments, the cerebrospinal fluid is collected by a ventricular puncture, where a needle is inserted directly into one of the brain's ventricles. In certain embodiments, a fluoroscopy is used to assist in guiding the insertion of the needle. In certain embodiments, the cerebrospinal fluid is collected from a tube that is part of a ventriculoperitoneal shunt or an external ventricular drain.

In certain embodiments, the cerebrospinal fluid sample is collected from the subject before the subject has any symptom of the CNS disorder, i.e., a non-symptomatic subject. In certain embodiments, the cerebrospinal fluid sample is collected from the non-symptomatic subject who is at high risk of the CNS disorder, e.g., multiple members of the subject's family have a history of cancer or the subject suffered a brain trauma before. In certain embodiments, the cerebrospinal fluid sample is collected from the subject who has at least one early symptom of the CNS disorder, e.g., headache, vomiting, and nausea. In certain embodiments, the cerebrospinal fluid sample is collected from the subject who has at least one symptom of the CNS disorder, e.g., persistent headache; pain in the face, back, arms, or legs; an inability to concentrate; loss of feeling; memory loss; loss of muscle strength; tremors; seizures; increased reflexes, spasticity, tics; paralysis; and slurred speech. In certain embodiments, the cerebrospinal fluid sample is collected from the subject who has previously received or is currently receiving a treatment for a CNS disorder. In certain embodiments, two or more cerebrospinal fluid samples (e.g., two or more, three or more, four or more, five or more, six or more or seven or more cerebrospinal fluid samples) can be obtained before and during the subject is receiving the treatment of the CNS disorder (e.g., serially obtained samples).

Diagnostic, Prognostic, Classification, and Monitoring Methods of this Example

This Example provides diagnostic and prognostic methods for diseases and/or disorders that are characterized by differential methylation of genomic loci. For example, but not by way of limitation, this Example provides methods for diagnosing, prognosing, classifying and/or monitoring a CNS disorder in a subject that includes analyzing the methylation status of certain genomic loci.

Non-limiting examples of CNS disorders that can be diagnosed, monitored and/or treated by the presently disclosed subject matter include brain and spinal cord tumors, brain and spinal cord infections (e.g., meningitis, brain abscesses and encephalitis), neurodegenerative diseases (e.g., Parkinson's disease, Alzheimer's disease, Huntington's disease, spinocerebellar ataxia, spinal muscular atrophy, motor neuron diseases and prion disease), CNS inflammations (e.g., CNS vasculitis, antibody-mediated inflammatory brain diseases, demyelinating conditions, Rasmussen's encephalitis and neurosarcoidosis and secondary inflammation that occurs second to another disease, e.g., meningitis, in the body), and neuropsychiatric disorders (e.g., depression, schizophrenia and autism spectrum disorders).

In certain embodiments, the analyzed genomic loci can include one or more genomic loci that exhibit differential methylation in a cerebrospinal fluid sample from a subject that has a CNS disorder compared to a reference sample. For example, but not by way of limitation, the methods of this Example include assessing the methylation status of one or more genomic loci, e.g., about 5 or more, about 10 or more, about 50 or more, about 100 or more, about 500 or more, about 1,000 or more, about 5,000 or more, about 10,000 or more, about 25,000 or more, about 50,000 or more or about 100,000 or more genomic loci in a cerebrospinal fluid sample of a subject. In certain embodiments, the genomic loci can be selected from the genes, or a region within the genes, provided in Tables 8 & 9, FIGS. 29-32, and Example D. In certain embodiments, the one or more genomic loci can be one or more promoter regions of one or more genes, one or more exons of one or more genes, one or more introns of one or more genes, one or more CpG sites, one or more CpG islands, one or more CpG island shores, one or more enhancers of one or more genes or a combination thereof. In certain embodiments, the genomic loci are present on a particular chromosome.

In certain embodiments, this Example provides for diagnosing, prognosing and/or monitoring a CNS disorder in a subject by detecting the DNA methylation profiles associated with the CNS disorder. In certain embodiments, the methods of this Example include (a) obtaining a cerebrospinal fluid sample from the subject, (b) determining the methylation status of one or more genomic loci present in the cerebrospinal fluid sample, e.g., present within cfDNA in the cerebrospinal fluid sample, (c) comparing the methylation status of the one or more genomic loci to a reference and (d) diagnosing a CNS disorder in the subject. In certain embodiments, the difference in the methylation status of the one or more genomic loci in the cerebrospinal fluid sample compared to the reference indicates the presence of the CNS disorder in the subject. In certain embodiments, the difference in the methylation status can also indicate the severity of the CNS disorder.

In certain embodiments, the methods disclosed herein for diagnosing, prognosing and/or monitoring a CNS disorder in a subject can include (a) obtaining a cerebrospinal fluid sample from the subject, (b) determining the level of methylation of one or more genomic loci present in the cerebrospinal fluid sample, (c) comparing the level of methylation of the one or more genomic loci to a reference; and (d) diagnosing a CNS disorder in the subject. In certain embodiments, the difference in the level of methylation of the one or more genomic loci in the cerebrospinal fluid sample compared to the reference indicates the presence of the CNS disorder in the subject. In certain embodiments, the difference in the methylation level can also indicate the severity of the CNS disorder.

In certain embodiments, diagnosing a CNS disorder in the subject includes characterizing a phenotype of the CNS disorder, wherein the difference in the methylation status of the one or more genomic loci in the cerebrospinal fluid sample compared to the reference indicates the phenotype of the CNS disorder. In certain embodiment, the phenotype of the CNS disorder includes the severity of the CNS disorder, prognosis of the CNS disorder, molecular expression profile of the CNS disorder, responsiveness of the CNS disorder to certain treatments, or any combinations thereof.

In certain embodiments, the methods disclosed herein for determining if a subject is at risk of developing a CNS disorder in a subject can include (a) obtaining a cerebrospinal fluid sample from the subject, (b) determining the level of methylation of one or more genomic loci present in the cerebrospinal fluid sample, (c) comparing the level of methylation of the one or more genomic loci to a reference; and (d) determining that the subject is at risk of developing a CNS disorder, wherein the difference in the level of methylation of the one or more genomic loci in the cerebrospinal fluid sample compared to the reference indicates that the subject is at risk.

In certain embodiments, diagnosing, prognosing and/or monitoring of a subject with a CNS disorder can be based on a higher or lower methylation level of the genomic locus in the cerebrospinal fluid sample of the subject relative to the methylation level in a reference sample, e.g., a cerebrospinal fluid sample from a subject that does not have the CNS disorder. In certain embodiments, a difference of greater than about 5%, greater than about 10%, greater than about 15%, greater than about 20%, greater than about 25%, greater than about 30%, greater than about 35%, greater than about 40%, greater than about 45%, greater than about 50%, greater than about 55%, greater than about 60%, greater than about 65%, greater than about 70%, greater than about 75%, greater than about 80%, greater than about 85%, greater than about 90% or greater than about 95% in the methylation (e.g., level, percentage and/or fraction) of the one or more genomic loci in a cerebrospinal fluid sample obtained from a subject compared to a control is indicative that the subject has the CNS disorder or is at risk of developing the CNS disorder. In certain embodiments, the difference can be a decrease in methylation (e.g., level, percentage and/or fraction) of the genomic loci in the cerebrospinal fluid sample of the subject. Alternatively, the difference can be an increase in methylation (e.g., level, percentage and/or fraction) of the genomic loci in the cerebrospinal fluid sample of the subject. In certain embodiments, the difference can be a decrease in the methylation of a genomic locus and an increase in the methylation of a different genomic locus in the sample obtained from the subject. In certain embodiments, a decrease in the level of methylation of one or more genomic loci in the cerebrospinal fluid sample and the increase in the level of methylation of one or more different genomic loci in the cerebrospinal fluid sample indicates the presence of the CNS disorder.

In certain embodiments, diagnosis of a subject with a CNS disorder can be based on the methylated or unmethylated state of a genomic locus, e.g., a CpG site. In certain embodiments, a genomic locus, e.g., a CpG site, in a sample from a subject diagnosed with a CNS disorder can be methylated and the genomic locus, e.g., the CpG site, in a reference sample can be unmethylated. In certain embodiments, a genomic locus in a sample from a subject diagnosed with a CNS disorder can be unmethylated and the genomic locus in a reference sample can be methylated.

Diagnostic, Prognostic, Classification and Monitoring Methods Using an Algorithm of this Example

This Example further provides diagnostic and prognostic methods for diseases and/or disorders that are characterized by differential methylation of genomic loci by using an algorithm, as disclosed in Example Embodiment E. For example, but not by way of limitation, this Example provides methods for diagnosing, prognosing, classifying and/or monitoring a CNS disorder in a subject that includes analyzing the methylation status of certain genomic loci and/or genomic fractions.

Methods of Treatment and/or Prevention of this Example

This Example further provides methods of treating a subject with a CNS disorder. Non-limiting examples of CNS disorders that can be treated by the presently disclosed methods include brain and spinal cord tumors, brain and spinal cord infections (e.g., meningitis, brain abscesses and encephalitis), neurodegenerative diseases (e.g., Parkinson's disease, Alzheimer's disease, Huntington's disease, spinocerebellar ataxia, spinal muscular atrophy, motor neuron diseases and prion disease), CNS inflammations (e.g., CNS vasculitis, antibody-mediated inflammatory brain diseases, demyelinating conditions, Rasmussen's encephalitis and neurosarcoidosis and secondary inflammation that occurs second to another disease, e.g., meningitis, in the body), and neuropsychiatric disorders (e.g., depression, schizophrenia and autism spectrum disorders). This Example also provides methods of preventing the development of a CNS disorder in a subject.

In certain embodiments, the methods disclosed herein can include diagnosing a subject with a CNS disorder as disclosed herein, followed by treatment of the subject. Any diagnosis methods of this Example can be used with the methods for treating a subject. In certain embodiments, the methods of this Example can include determining that a subject is at risk of developing a CNS disorder as disclosed herein, followed by a method for preventing the development of the CNS disorder in the subject.

In certain embodiments, the treatment methods can include (a) obtaining a cerebrospinal fluid sample from the subject, (b) determining the methylation status and/or level of one or more genomic loci present in the cerebrospinal fluid sample, (c) comparing the methylation status and/or level of the one or more genomic loci to a reference, (d) diagnosing a CNS disorder in the subject, where the difference in the methylation status and/or level of the one or more genomic loci in the cerebrospinal fluid sample compared to the reference indicates the presence of the CNS disorder in the subject and (e) treating the subject diagnosed with the CNS disorder.

In certain embodiments, the methods of treatment can include diagnosing the subject with a CNS disorder by use of the algorithm disclosed herein. For example, but not by way of limitation, the treatment method can include (a) diagnosing the subject with a CNS disorder by use of the disclosed algorithm and (b) treating the subject diagnosed with the CNS disorder.

In certain embodiments, the prophylactic methods can include (a) obtaining a cerebrospinal fluid sample from the subject, (b) determining the methylation status and/or level of one or more genomic loci present in the cerebrospinal fluid sample, (c) comparing the methylation status and/or level of the one or more genomic loci to a reference, (d) determining that the subject is at risk of developing a CNS disorder, where the difference in the methylation status and/or level of the one or more genomic loci in the cerebrospinal fluid sample compared to the reference indicates that the subject is at risk of developing a CNS disorder and (e) preventing the subject from developing the CNS disorder.

In certain embodiments, the prophylactic methods can include (a) determining that the subject is at risk of developing a CNS disorder by use of the disclosed algorithm of this Example and (b) preventing the subject from developing the CNS disorder.

Any suitable treatment methods and preventative methods known in the art can be used with the presently disclosed methods for treating and preventing CNS disorders. For example, but not by way of limitation, the subject can be treated by administration of a medication to treat the CNS disorder. Non-limiting examples of treatment methods for CNS disorders including depression, epilepsy, multiple sclerosis (MS), neurodegenerative diseases (e.g., Alzheimer's disease), neuropathic pain and schizophrenia are disclosed in DiNunzio and Williams, “CNS Disorders—Current Treatment Options and the Prospects for Advanced Therapies,” Drug Development and Industrial Pharmacy, 34:11:1141-1167 (2008).

In certain embodiments, the information provided by the methods described herein can be used by a physician in determining the most effective course of treatment (e.g., preventative or therapeutic) for the subject. A course of treatment refers to the measures taken for a patient after the prognosis or the assessment of increased risk for development of a CNS disorder is made. For example, when a subject is identified to have an increased risk of developing a CNS disorder, the physician can determine whether frequent monitoring of DNA methylation changes can be performed as a prophylactic measure. Also, when a subject is diagnosed with a CNS disorder (e.g., based on the presence of a DNA methylation pattern in a sample from a subject), it can be advantageous to follow such detection with a therapeutic treatment.

In certain embodiments, this Example further provides methods for assessing the efficacy of a therapeutic or prophylactic therapy for preventing, inhibiting or treating a CNS disorder in a subject, comprising determining the methylation status of one or more genomic loci present in a cerebrospinal fluid sample obtained from a subject prior to the therapy and determining methylation status of the one or more genomic loci present in a cerebrospinal fluid sample obtained from the subject at one or more time points during the therapeutic or prophylactic therapy, wherein the therapy is efficacious for preventing, inhibiting and/or treating a CNS disorder in a subject when there is a change in the presence and/or level of methylation of the one or more genomic loci in the second or subsequent samples, relative to the first sample.

In certain embodiments, the first sample is obtained after therapeutic treatment has begun.

In certain embodiments, the methods for monitoring the response in a subject to prophylactic or therapeutic treatment can include measuring the methylation status and/or level of one or more genomic loci in a cerebrospinal fluid sample of a subject at a first timepoint, administering a therapeutic agent, re-measuring the methylation status and/or level of the one or more genomic loci at a second timepoint, comparing the results of the first and second measurements and optionally modifying the treatment regimen based on the comparison. In certain embodiments, the first timepoint is prior to an administration of the therapeutic agent, and the second timepoint is after said administration of the therapeutic agent. In certain embodiments, the first timepoint is prior to the administration of the therapeutic agent to the subject for the first time. In certain embodiments, the dose (defined as the quantity of therapeutic agent administered at any one administration) is increased or decreased in response to the comparison. In certain embodiments, the dosing interval (defined as the time between successive administrations) can be increased or decreased in response to the comparison, including total discontinuation of treatment. In addition, the method of the present disclosure can be used to determine the efficacy of the therapeutic treatment, wherein a change in the methylation status of certain genomic loci present in a cerebrospinal fluid sample of a subject can indicate that the therapeutic treatment regimen can be altered, reduced and/or stopped.

Assays of this Example

This Example further provides assays and/or methods for determining the DNA methylation status and/or level of genomic loci that correlates with the presence, absence and/or severity of a CNS disorder.

In certain embodiments, the assay method of this Example can include comparing the methylation status and/or level of genomic loci present in a cerebrospinal fluid sample from a subject that has a CNS disorder to the methylation status and/or level of genomic loci in a cerebrospinal fluid sample from a healthy subject to determine the methylation pattern, as disclosed above, that correlates with the presence of the CNS disorder.

In certain embodiments, the assay methods of this Example can include comparing the methylation status and/or level of genomic loci in a cerebrospinal fluid sample from a subject that has a CNS disorder at an early stage to the methylation status and/or level of genomic loci in a cerebrospinal fluid sample from a subject that has the CNS disorder at a late stage to determine the methylation status and/or level that correlates with the different stages and/or severity of the CNS disorder.

Non-limiting examples of CNS disorders include brain and spinal cord tumors, brain and spinal cord infections (e.g., meningitis, brain abscesses and encephalitis), neurodegenerative diseases (e.g., Parkinson's disease, Alzheimer's disease, Huntington's disease, spinocerebellar ataxia, spinal muscular atrophy, motor neuron diseases and prion disease), CNS inflammations (e.g., CNS vasculitis, antibody-mediated inflammatory brain diseases, demyelinating conditions, Rasmussen's encephalitis and neurosarcoidosis, and secondary inflammation occurs second to another disease, e.g., meningitis, in the body), and neuropsychiatric disorders (e.g., depression, schizophrenia and autism spectrum disorders).

DNA Isolation Techniques of this Example

In certain embodiments, the methods of this Example include isolating nucleic acid from a cerebrospinal fluid sample obtained from a subject. There are several platforms that are known in the art and currently available to isolate nucleic acids from cerebrospinal fluid samples. For example, but not by way of limitation, isolation of DNA from a cerebrospinal fluid sample can be performed by extraction methods using organic solvents such as a mixture of phenol and chloroform, followed by precipitation with ethanol (see, for example, J. Sambrook et al., “Molecular Cloning: A Laboratory Manual,” 1989, 2nd Ed., Cold Spring Harbor Laboratory Press: New York, N.Y.). Additional non-limiting examples include salting out DNA extraction (see, for example, P. Sunnucks et al., Genetics, 1996, 144:747-756; and S. M. Aljanabi and I. Martinez, Nucl. Acids Res. 1997, 25:4692-4693), the trimethylammonium bromide salts DNA extraction method (see, for example, S. Gustincich et al., BioTechniques, 1991, 11:298-302) and the guanidinium thiocyanate DNA extraction method (see, for example, J. B. W. Hammond et al., Biochemistry, 1996, 240:298-300). There are also numerous commercially available kits that can be used to extract DNA from biological fluids (e.g., cerebrospinal fluid) or cells, for example, Qiagen's Gentra PureGene Cell Kit, QlAamp Circulating Nucleic Acid Kit, QiaAmp DNA Mini Kit, DNeasy Blood and Tissue Kit or QiaAmp DNA Blood Mini Kit (Qiagen, Hilden, Germany), GenomicPrep™ Blood DNA Isolation Kit (Promega, Madison, Wis.) and GFX™ Genomic Blood DNA Purification Kit (Amersham, Piscataway, N.J.) can be used to obtain DNA from a cerebrospinal fluid sample from a subject.

Methylation Detection Techniques of this Example

Various methylation analysis procedures are known in the art, and can be used with the methods of this Example. These assays allow for determination of the methylation state of one genomic locus, e.g., one or more CpG sites or islands within a nucleic acid obtained from a cerebrospinal fluid sample. In addition, the methods can be used to quantify the methylation of a genomic locus. Such assays involve, among other techniques, DNA sequencing of bisulfite-treated DNA, PCR (for sequence-specific amplification), digital PCR and use of methylation-sensitive restriction enzymes.

In certain embodiments, methylation-specific PCR can be used to determine the methylation status of a genomic loci. Methylation-specific PCR is based on a chemical reaction of sodium bisulfite with DNA that converts unmethylated cytosines, e.g., of CpG dinucleotides, to uracil or UpG, followed by traditional PCR. Methylated cytosines will not be converted in this process and primers can be designed to overlap the methylation site, e.g., CpG site, of interest, thereby allowing one to determine the methylation status of the methylation site as methylated or unmethylated. Additionally, restriction enzyme digestion of PCR products amplified from bisulfite-converted DNA may be used, e.g., by using the method described by Sadri & Hornsby (Nucl. Acids Res. 1996; 24:5058-5059) or COBRA (Combined Bisulfite Restriction Analysis) (Xiong & Laird, Nucleic Acids Res. 1997; 25:2532-2534).

In certain embodiments, whole genome bisulfite sequencing, which is a high-throughput genome-wide analysis of DNA methylation, can be used to determine the methylation status of multiple genomic loci. It is based on sodium bisulfite conversion of genomic DNA, as described above, which is then sequenced on a next-generation sequencing platform. The sequences obtained are then re-aligned to the reference genome to determine the methylation states of cytosines, e.g., of CpG dinucleotides, present within the analyzed genomic loci based on mismatches resulting from the conversion of unmethylated cytosines into uracil.

In certain embodiments, genome-wide DNA methylation profiling can be performed using commercially-available arrays, thereby allowing the interrogation of multiple genomic loci, e.g., multiple CpG sites. Non-limiting examples of such arrays include HumanMethylation BeadChips (Illumina, San Diego, Calif.) and Infinium MethylationEPIC kit (Illumina). Additional methods for analyzing the methylation state of multiple genomic loci is provided in Yong et al., Epigenetics & Chromatin 2016; 9:26, which is incorporated by reference herein.

Kits of this Example

This Example provides kits for diagnosing, monitoring, classifying and/or treating a subject with a CNS disorder. The kits of this Example comprise a means for determining and/or detecting the methylation status of one or more genomic loci.

Types of kits of this Example include, but are not limited to, packaged probe and primer sets (e.g., TaqMan probe/primer sets), arrays/microarrays, which further contain one or more probes, primers or other detection reagents for determining the methylation state and/or level of one or more genomic loci. For example, but not by way of limitation, a kit of this Example can include one or more probes or primers for detecting the methylation state of one or more genomic loci. In certain embodiments, the one or more genomic loci comprise a CpG site. In certain embodiments, one or more of the genomic loci do not comprise a CpG site. For example, but not by way of limitation, about 5% or more, about 10% or more, about 15% or more, about 20% or more, about 25% or more, about 30% or more, about 35% or more, about 40% or more, about 45% or more, about 50% or more, about 55% or more, about 60% or more, about 65% or more or about 70% or more of the one or more genomic loci detected by the primers or probes of this Example comprise one or more CpG sites.

In certain non-limiting embodiments, a primer and/or probe of this Example can be at least about 10 nucleotides or at least about 15 nucleotides or at least about 20 nucleotides in length and/or up to about 200 nucleotides or up to about 150 nucleotides or up to about 100 nucleotides or up to about 75 nucleotides or up to about 50 nucleotides in length.

In a further non-limiting embodiment, the oligonucleotide primers and/or probes can be immobilized on a solid surface or support, for example, on a nucleic acid microarray, wherein the position of each oligonucleotide primer and/or probe bound to the solid surface or support is known and identifiable.

In certain non-limiting embodiments, the kits of this Example additionally include other components such as, but not limited to, a buffer, enzymes such as DNA polymerases or ligases, nucleotides such as deoxynucleotide triphosphates, positive control sequences, negative control sequences and the like necessary to carry out an assay or reaction to detect the methylation state of a genomic locus.

In certain embodiments, the kits of this Example include a container comprising one or more probes and/or primers for detecting the methylation state of one or more genomic loci. In certain embodiments, the kits further include instructions for use, e.g., the instructions can describe that a particular methylation status of a genomic locus is indicative of a CNS disorder in a subject. The instructions can be printed directly on the container (when present), or as a label applied to the container, or as a separate sheet, pamphlet, card or folder supplied in or with the container.

Reports, Programmed Computers, and Systems of this Example

In certain embodiments, the presently disclosed diagnosis and/or monitoring of a CNS disorder in a subject based on the methylation status of one or more genomic loci can be referred to herein as a “report.” A tangible report can optionally be generated as part of a testing process (which can be interchangeably referred to herein as “reporting,” or as “providing” a report, “producing” a report or “generating” a report).

Examples of tangible reports can include, but are not limited to, reports in paper (such as computer-generated printouts of test results) or equivalent formats and reports stored on computer readable medium (such as a CD, USB flash drive or other removable storage device, computer hard drive, or computer network server, etc.). Reports, particularly those stored on computer readable medium, can be part of a database, which can optionally be accessible via the internet (such as a database of patient records or genetic information stored on a computer network server, which can be a “secure database” that has security features that limit access to the report, such as to allow only the patient and the patient's medical practitioners to view the report while preventing other unauthorized individuals from viewing the report, for example). In addition to, or as an alternative to, generating a tangible report, reports can also be displayed on a computer screen (or the display of another electronic device or instrument).

A report can include, for example, an individual's medical history, or can just include size, presence, absence or levels of one or more markers (for example, a report on computer readable medium such as a network server can include hyperlink(s) to one or more journal publications or websites that describe the medical/biological implications). Thus, for example, the report can include information of medical/biological significance as well as optionally also including information regarding the methylation status of relevant genomic loci, or the report can just include information regarding the methylation status of relevant genomic loci without other medical/biological significance.

A report can further be “transmitted” or “communicated” (these terms can be used herein interchangeably), such as to the individual who was tested, a medical practitioner (e.g., a doctor, nurse, clinical laboratory practitioner, genetic counselor, etc.), a healthcare organization, a clinical laboratory, and/or any other party or requester intended to view or possess the report. The act of “transmitting” or “communicating” a report can be by any means known in the art, based on the format of the report. Furthermore, “transmitting” or “communicating” a report can include delivering a report (“pushing”) and/or retrieving (“pulling”) a report. For example, reports can be transmitted/communicated by various means, including being physically transferred between parties (such as for reports in paper format) such as by being physically delivered from one party to another, or by being transmitted electronically or in signal form (e.g., via e-mail or over the internet, by facsimile, and/or by any wired or wireless communication methods known in the art) such as by being retrieved from a database stored on a computer network server, etc.

In certain embodiments, the disclosed subject matter provides computers (or other apparatus/devices such as biomedical devices or laboratory instrumentation) programmed to carry out the methods described herein, e.g., to perform the algorithm of this Example (see Example Embodiment E). In certain embodiments, the system can be controlled by the individual and/or their medical practitioner in that the individual and/or their medical practitioner requests the test, receives the test results back and (optionally) acts on the test results to reduce the individual's CNS disorder risk or treat the individual, such as by implementing a disorder management system.

The following Example Embodiments are offered to more fully illustrate the disclosure of this Example, but are not to be construed as limiting the scope thereof.

Example Embodiment D of Example 2: Non-Invasive Molecular Phenotyping of the Human Brain Via Epigenomic Liquid Biopsy of Cerebrospinal Fluid

Example Embodiment D discovered methods of epigenomic liquid biopsy for the comprehensive analysis of cell-free DNA (cfDNA) methylation signatures in the human central nervous system (CNS). Example Embodiment D used solution phase hybridization and high throughput bisulfite sequencing to compare DNA methylation signatures of cfDNA obtained from cerebrospinal fluid (CSF) and plasma. Recovery of cfDNA from CSF was relatively low (68 to 840/mL CSF) compared to plasma. Distributions of CpG methylation were significantly altered between CSF and plasma, both generally and at the level of specific functional elements such as exons, introns, CpG islands and shores. Sliding window analysis was used to identify differentially methylated CpG sites. Example Embodiment D found numerous gene/locus-specific differences in CpG methylation between cfDNA from CSF and plasma. These loci were more likely to be hypomethylated in CSF compared to plasma. Differentially methylated CpGs in CSF were identified in genes related to branching of neurites and neuronal development. Example Embodiment D found clear association between tissue-specific gene expression in the CNS and cfDNA methylation patterns in CSF. Ingenuity pathways analysis (IPA) of differentially methylated regions identified an enrichment of functional pathways related to neurobiology. The GTEX RNA expression database was used to analyze the presently disclosed data in the context of central nervous system (CNS)-specific gene expression. In conclusion, Example Embodiment D was the first comprehensive quantitative genome-wide analysis of DNA methylation in human CSF. The presently disclosed methods of this Example can be used for epigenomic liquid biopsy of the human CNS for molecular phenotyping of brain-derived DNA methylation signatures.

Example Embodiment D obtained CSF from healthy volunteers, and recovered brain-derived cfDNA from CSF. Example Embodiment D also quantified and catalogued the brain-derived cfDNA in a genome-wide fashion at the level of DNA methylation via bisulfite sequencing. In order to generate preliminary functional insights into brain-specific molecular phenotypes, solution-phase hybridization coupled with high-throughput sequencing was used to compare DNA methylation of cfDNA from CSF with that from plasma.

Methods

Human Samples

Plasma and CSF samples were processed for cell-free nucleic acid analysis as previously described (Chu T et al., PLoS One. 2017; 12(3):e0171882). Healthy human Lumbar puncture for CSF collection was performed under fluoroscopic guidance by the Neuroradiology Department at UPMC. All participants provided informed consent. CSF donors had no personal or first degree-relative history of psychiatric disorder or suicidal behaviour.

Preparation and Analysis of Bisulfite DNA Sequencing Libraries

DNA sequencing libraries were prepared using the SeqCap-Epi kit (Roche). Libraries were sequenced on an Illumina HiSeq 2500 instrument. Reads were quality trimmed and adaptor sequences were removed using Trim-Galore. Reads were aligned using Bismark in paired-end, bowtie2 modes. Unmapped reads were aligned using single-end, bowtie2 modes (Bismark). Duplicates were removed (Bismark). Methylation was called on pair-end and single-end files and then merged. CpG methylation information for autosomes was read into MethylSig for differential methylation analysis.

Results

Example Embodiment D examined the physical properties of cfDNA in CSF. Following extraction, DNA amount was quantified by real time PCR. The resulting yields for a series of CSF samples are shown in Table 5A.

TABLE 5A Amounts of cfDNA recovered from a series of CSF samples. Amount of DNA is shown in picograms per mL of starting CSF. Amount of cfDNA Recovered from Sample 1 mL CSF in pg LP667 145 LP676 164 LP418 180 LP603 228 LP429 278 LP435 143 LP630 840 LP638 309 LP673 188 LP678 402 LP366  68

Quantities of cfDNA are low, ranging from 68 pg to 840 pg per mL of input CSF (mean 268 ng/mL). This is significantly lower than cfDNA recovery from plasma (Table 5B).

TABLE 5B Amounts of cfDNA recovered from a series of plasma samples obtained from non-pregnant females. Amount of DNA is shown in picograms per mL of starting plasma. Sample DNA pg/mL of Plasma EN7 3320 EN8 3770 EN9 4260 EN11 2830 EN12 8390 EN13 6390 EN14 2720 EN15 6920 EN16 8060 EN17 8260 EN18 3590 EN20 5350

Example Embodiment D then explored the size distribution of cfDNA from CSF and compared this to plasma. It was found that cfDNA fragments recovered from CSF had a mean size of approximately 155 bp, which was approximately 20 bp less than the mean fragment size of cfDNA recovered from plasma (Table 6).

TABLE 6 Mean fragment length of cfDNA in samples. Sample Mean Median Name (Type) Fragment Length Fragment Length LP374 (CSF) 154 155 LP374 (CSF) 158 159 EN1 (Plasma) 178 175 EN2 (Plasma) 182 176 EN3 (Plasma) 178 176 EN4 (Plasma) 170 170 EN5 (Plasma) 177 175 EN6 (Plasma) 181 177 EN7 (Plasma) 176 174 EN8 (Plasma) 175 174 EN9 (Plasma) 177 174 EN11 (Plasma) 177 175 EN12 (Plasma) 172 171

Example Embodiment D performed solution phase hybridization capture of bisulfate-converted cfDNA from two CSF and two plasma samples. To further explore size differences between CSF and plasma, fragment size was plotted against read count, which confirmed that cfDNA fragments from CSF were significantly shorter in length than those from plasma (FIG. 26). Furthermore, it was found that cfDNA from CSF exhibits a periodicity of fragment density peaks differing in size by approximately 10 bp (FIG. 27 and Table 7).

TABLE 7 Fragment size distribution for cfDNA from CSF (LP372 and LP374) and plasma (EN1) Fragment Length LP372 LP374 EN1 20 18.1655359 4.12935877 0 21 12.1103572 8.25871754 0 22 36.3310717 12.3880763 0 23 0 8.25871754 0 24 30.2758931 16.5174351 0 25 18.1655359 16.5174351 1 26 12.1103572 12.3880763 1 27 30.2758931 20.6467938 0 28 30.2758931 28.9055114 0 29 30.2758931 66.0697403 3 30 121.103572 119.751404 4 31 84.7725007 90.8458929 4 32 145.324287 115.622046 3 33 211.931252 152.786274 3 34 242.207145 251.890885 4 35 260.372681 264.278961 5 36 351.20036 280.796396 6 37 423.862503 545.075358 16 38 611.573041 491.393694 12 39 708.455898 730.896502 21 40 666.069648 743.284579 18 41 811.393935 958.011235 18 42 865.890543 962.140593 15 43 847.725007 1048.85713 28 44 1144.42876 1350.30032 38 45 1271.58751 1643.48479 47 46 1441.13251 1742.5894 48 47 1592.51198 2229.85374 51 48 1955.82269 2518.90885 78 49 2682.44413 3274.5815 75 50 2960.98234 3266.32279 119 51 2809.60288 3563.63662 116 52 2767.21663 3534.73111 105 53 2858.04431 4021.99544 115 54 2906.48574 4083.93582 133 55 3039.69967 4492.74234 146 56 3663.38306 4860.25527 174 57 4196.23878 5520.95268 223 58 4813.867 6838.21812 291 59 6339.77201 7808.61743 292 60 6618.31023 8779.01674 341 61 6981.62095 9220.85813 316 62 6503.26184 8704.68829 348 63 5958.29576 9319.96274 344 64 6479.04112 9241.50493 378 65 6715.19309 9823.74451 424 66 7139.05559 9943.49592 487 67 7902.0081 10587.6759 489 68 8513.58114 12045.3395 577 69 10856.9353 13296.5352 689 70 11856.0397 14308.2281 810 71 10826.6594 14081.1134 736 72 10669.2247 13887.0335 730 73 10608.6729 13903.551 740 74 10536.0108 14502.308 769 75 11074.9217 14750.0695 853 76 11886.3156 15596.5881 936 77 12346.5092 16121.0166 1016 78 15265.1053 18293.0594 1236 79 19425.013 21522.2179 1580 80 23173.1686 24573.814 1851 81 22997.5684 24668.7893 1851 82 19848.8755 22067.2933 1602 83 16627.5205 19647.489 1309 84 16312.6512 18590.3732 1297 85 16845.5069 20597.2415 1406 86 17160.3762 21447.8895 1456 87 19588.5028 22248.9851 1603 88 22016.6295 25589.6363 1899 89 28132.3599 30920.6385 2252 90 38734.9776 38745.7733 3086 91 43439.8514 41652.8419 3526 92 44414.7352 41809.7575 3315 93 37475.5005 36932.9848 2766 94 31826.0188 32126.4112 2468 95 28762.0984 31028.0018 2284 96 29670.3752 32118.1525 2426 97 32782.737 33625.3685 2715 98 37669.2662 38675.5742 3178 99 50094.4927 47983.1489 3978 100 50887.7211 48317.627 4978 101 66897.6134 61651.3264 4602 102 68877.6568 63587.9957 5316 103 66504.0268 62200.5312 4945 104 59861.4958 56167.538 4933 105 54327.0626 53206.7878 4683 106 54212.0142 52054.6967 4904 107 57233.5483 55159.9744 5361 108 65135.5564 61172.3208 6047 109 77742.4383 73031.8392 7117 110 96967.6304 89735.0954 8777 111 114115.896 101532.673 10212 112 115865.843 102593.919 10374 113 99977.0542 90267.7827 9348 114 85783.7155 80501.8492 8594 115 81666.194 76033.883 8582 116 85214.5287 76727.6153 9247 117 88641.7598 79642.9426 10058 118 98214.9972 88207.2327 11458 119 116774.12 102602.177 13897 120 148376.097 128472.61 17219 121 191846.224 159182.651 22351 122 221873.855 183426.117 25397 123 215588.58 177603.721 24829 124 168909.208 143379.595 20151 125 134443.131 115650.951 17103 126 119547.391 104823.772 16914 127 123253.161 107887.757 18354 128 133613.571 115605.528 20682 129 152814.543 129438.88 23258 130 178736.762 155924.587 28592 131 224622.906 188067.516 36576 132 280808.908 234440.215 47251 133 333131.707 276683.555 57525 134 339907.452 280536.247 59591 135 291865.665 243396.794 54337 136 246645.591 204498.234 50144 137 236491.056 193827.971 53729 138 255867.628 207343.363 63398 139 295868.138 244363.064 78271 140 354131.066 295988.307 96596 141 401791.377 343785.635 110936 142 418909.367 362132.376 118556 143 418564.222 364655.414 121296 144 418213.022 365960.292 125102 145 415790.95 363618.945 130557 146 385987.361 337583.338 134678 147 360489.004 313653.704 139586 148 353943.356 304057.074 150072 149 362687.034 310907.681 163436 150 383631.897 331847.659 178397 151 413683.748 367376.662 195570 152 446091.064 403347.506 210262 153 448652.405 410627.565 215911 154 425564.009 390885.101 216568 155 409753.937 375044.881 223618 156 406998.831 364622.379 242943 157 408203.811 363412.477 271460 158 416880.882 368929.301 308096 159 434743.659 386099.174 345628 160 446666.306 404082.532 378777 161 453084.795 418754.144 404958 162 449808.944 423007.383 436486 163 462942.626 430469.134 486903 164 476584.944 442246.066 558909 165 488083.728 445896.419 623138 166 484038.868 437022.427 657408 167 469554.881 430138.786 670547 168 451849.539 421310.217 672348 169 432067.27 414979.91 662314 170 410347.345 403169.944 648068 171 387537.487 391421.918 626438 172 368942.033 377828.069 613248 173 350461.628 370304.377 610963 174 339992.224 359402.87 616896 175 328087.743 354881.222 619799 176 314306.157 344124.242 614507 177 298005.616 330745.12 591992 178 278035.637 314904.9 559319 179 260530.115 303350.954 523767 180 238113.844 287106.057 486651 181 218168.086 271587.926 453452 182 202509.394 256123.478 426739 183 186862.812 242525.499 402693 184 174068.22 230884.837 386125 185 161128.303 220681.191 374770 186 150180.54 208334.409 363365 187 140298.489 198139.022 351750 188 130507.265 187956.023 339394 189 122235.891 178252.03 320487 190 113716.254 169233.51 299582 191 103325.568 159356.084 278594 192 94775.6557 149561.245 258141 193 87939.3591 140695.512 239078 194 80539.9308 130954.355 225250 195 74817.787 123884.892 213210 196 70645.769 116522.246 204351 197 63882.1344 109605.57 196298 198 59056.1571 103110.088 189910 199 55374.6085 98320.0323 180675 200 51838.3842 92993.1595 172164 201 48598.8636 87819.073 161500 202 44087.7555 82690.4094 149421 203 41901.836 77503.9348 139531 204 37723.7628 70661.5873 129951 205 35453.0708 67329.1947 120744 206 32104.557 63389.7865 114771 207 30390.9415 59153.0644 108677 208 28374.567 56192.3141 103424 209 26140.2061 53499.9722 99371 210 24420.5354 49081.5583 94738 211 23136.8375 46884.7395 88154 212 21532.2152 42895.7789 82930 213 19079.8678 40236.4719 76873 214 17348.0867 37255.0748 71462 215 16258.1546 35491.8386 66474 216 15761.6299 32716.9095 61730 217 14496.0976 30065.8612 58942 218 13624.1519 28802.2774 55677 219 12952.0271 27200.0862 52944 220 11516.9497 25081.7252 50446 221 11323.184 23206.9963 47643 222 10584.4522 22459.5823 44112 223 9760.94793 20750.0278 41517 224 9113.04382 19296.4935 38807 225 8574.13292 18008.1336 36644 226 8089.71864 16249.0268 33759 227 7883.84256 15398.3789 32127 228 6793.91041 14407.3327 30947 229 6545.64809 13924.1978 29433 230 6242.88916 13094.1967 27861 231 5728.19897 12144.4441 26902 232 5582.87469 11112.1044 25394 233 5255.89504 10612.452 24190 234 4941.02575 10397.7254 22850 235 4396.05968 9319.96274 21545 236 3923.75575 8287.62305 20590 237 4208.34914 8010.95601 19692 238 3651.27271 7284.18887 19022 239 3742.10039 7139.66131 18628 240 3505.94842 6755.63095 17904 241 3312.1827 6239.4611 17751 242 3221.35503 6053.63996 16986 243 2991.25824 5657.22151 16639 244 2991.25824 5677.86831 16073 245 2543.17502 5128.66359 15498 246 2367.57484 4765.28002 15162 247 2494.73359 4426.6726 14774 248 2252.52645 4141.74685 14574 249 2131.42287 4042.64224 14231 250 2076.92627 4013.73672 14306 251 1980.04341 3832.04494 14044 252 1919.49162 3526.47239 13829 253 2082.98145 3472.79073 13723 254 1877.10537 3134.18331 13464 255 1640.95341 3204.38241 13434 256 1531.96019 2820.35204 13150 257 1677.28448 2762.54102 12823 258 1580.40162 2787.31717 12774 259 1598.56716 2494.1327 12713 260 1392.69108 2361.99322 12779 261 1186.81501 2295.92348 12662 262 1519.84983 2316.57027 12851 263 1398.74626 2452.83911 12850 264 1471.4084 2196.81887 12819 265 1289.75305 2072.9381 12584 266 1223.14608 1973.83349 12280 267 1180.75983 1973.83349 12021 268 1162.59429 1788.01235 12208 269 1017.27001 1709.55453 12278 270 1168.64947 1816.91786 12571 271 1059.65626 1763.23619 12615 272 1077.82179 1746.71876 12973 273 993.049294 1713.68389 13075 274 974.883758 1610.44992 13129 275 1089.93215 1523.73339 13062 276 1071.76662 1552.6389 12712 277 932.497507 1560.89762 12750 278 1011.21483 1362.68839 12917 279 914.331972 1358.55904 13137 280 999.104472 1271.8425 13614 281 914.331972 1325.52417 14599 282 853.780185 1478.31044 14721 283 932.497507 1391.59391 15192 284 950.663043 1441.14621 15042 285 944.607865 1226.41955 15132 286 968.828579 1189.25533 15183 287 769.007685 1280.10122 15643 288 805.338756 1267.71314 15887 289 896.166436 1226.41955 16424 290 829.559471 1094.28007 17111 291 884.056078 1280.10122 17625 292 968.828579 1185.12597 18017 293 762.952506 1342.0416 18332 294 805.338756 1164.47917 19032 295 968.828579 1296.61865 19504 296 702.40072 1222.2902 20330 297 993.049294 1193.38468 20884 298 841.669828 1292.48929 21544 299 853.780185 1176.86725 22156 300 932.497507 1218.16084 22486 301 823.504292 1234.67827 23103 302 878.0009 1197.51404 23665 303 884.056078 1267.71314 23679 304 829.559471 1209.90212 24359 305 1023.32519 1218.16084 25728 306 0 1337.91224 0 307 0 1086.02136 0 308 0 1370.94711 0 309 0 1238.80763 0 310 0 1176.86725 0 311 0 1271.8425 0 312 0 1218.16084 0 313 0 1309.00673 0 314 0 1247.06635 0 315 0 1267.71314 0 316 0 1309.00673 0 317 0 1300.74801 0 318 0 1090.15072 0 319 0 1214.03148 0 320 0 1242.93699 0 321 0 1230.54891 0 322 0 1086.02136 0 323 0 1094.28007 0 324 0 1234.67827 0 325 0 1205.77276 0 326 0 1081.892 0 327 0 1065.37456 0 328 0 1024.08097 0 329 0 1086.02136 0

To explore the distributions of CpG methylation in cfDNA from CSF, Example Embodiment D examined the percentages of CpGs within cfDNA fragments that are methylated at low methylation (LM) levels (<20%), high methylation (HM) levels (>80%) or at intermediate methylation (IM) levels (20-80%) in both CSF and plasma. It was found that CpG sites in CSF, as in plasma, are largely distributed in a biphasic manner with large numbers of LM sites and very few IM sites. This is consistent with previous reports from DNA methylation analysis in differentiated intact adult cells (Chu T et al., PLoS One. 2011; 6(2):e14723). cfDNA in CSF is represented by fewer IM sites than plasma and greater numbers of LM and HM sites (FIG. 28A).

Example Embodiment D further explored distributions of CpG methylation in the context of distinct genomic elements, specifically introns, exons, promoters, CpG islands and CpG island shores. The relationship between Group (CSF/Plasma) and Response (<20%, 20-80% and >80%) was tested, after controlling for Category (intergenic region, intron, exon or promoter) using Cochran-Mantel-Haenszel test. The p-value (<0.0001) indicates that the numbers of sites at <20%, 20-80% and >80% are significantly different between CSF and Plasma after adjusting for category. Specifically, distribution of DNA methylation varies across CSF and plasma samples adjusted for categories (intergenic, intron, exon or promoter) such that the CSF DNA methylome contains a greater percentage of >80% methylation in exons and introns compared to plasma (FIG. 28B). DNA methylation distributions in CGIs and CGI shores appeared to be largely similar between one of the CSF samples (LP374) and the plasma samples. In general, the rate of methylation was lower in CGIs followed by shores and other in these samples (FIG. 28C).

Example Embodiment D further explored how the methylation signatures in cfDNA fragments from CSF differ from those of plasma in a gene/locus-specific fashion. A sliding windows analysis was performed to identify genomic regions (250 bp) whose CpG methylation characteristics differ significantly between cfDNA from CSF and plasma. As shown in FIG. 29 numerous differences were identified in or around known genes that differ between CSF and plasma. Notably, around two-fold more loci were identified that were significantly (p=<0.01) more methylated in plasma than CSF (n=394) than vice versa (n=216). Thus it appears that cfDNA from CSF is relatively hypomethylated compared to cfDNA from plasma. FIG. 30 contains the 100-most differentially methylated loci identified in which cfDNA from CSF is methylated at a lower rate than that of plasma. The list clearly shows that these loci are enriched for loci that are linked to genes that encode brain- or neuron-specific proteins or are related to central nervous system molecular physiology. Examples include, ADAP1 (Stricker R et al., Biol Chem. 2014; 395(11):1321-40), ANO1 (Cho S J et al., Cell Tissue Res. 2014; 357(3):563-9), ARHGEF7 (Chia R et al., Nat Commun. 2014; 5:5827), ARRB2 (Liu X et al., Proc Natl Acad Sci USA. 2015; 112(14):4483-8), FBXO33 (Sanchez-Mora C et al., Neuropsychopharmacology. 2015; 40(4):915-26), FOXN3, KLF15 (Ohtsuka T et al., Stem Cells. 2011; 29(11):1817-28), PCBP3 (Boschi N M et al., Neuroscience. 2015; 286:1-12), RAPGEF2 (Jiang S Z et al., eNeuro. 2017; 4(5)), RFX3 (Benadiba C et al., PLoS Genet. 2012; 8(3):e1002606), RGS3 (Qiu R et al., Stem Cells. 2010; 28(9):1602-10; Qiu R et al., J Cell Biol. 2008; 181(6):973-83), RYK (Lanoue V et al., Sci Rep. 2017; 7(1):5965). These account for ˜38% of the total number of genes in this group (n=32) and the remainder are represented largely by, as yet, functionally uncharacterized loci including a number of micro RNA encoding sequences. Interestingly, FOXN3 has been shown to be upregulated and demethylated upon neuronal activation (Grassi D et al., Cereb Cortex. 2017; 27(8):4166-81). FIG. 31 contains the 100-most differentially methylated loci identified in which cfDNA of CSF is methylated at a higher rate than that of plasma. Little or no enrichment is apparent for loci that are linked to genes encoding brain- or neuron-specific proteins. Specifically the present disclosure found only four such genes, CNTFR (Askvig J M et al., Exp Neurol. 2012; 233(1):243-52), BAG3 (Santoro A et al., Cell Tissue Res. 2017; 368(2):249-58), KLF15 (Ohtsuka T et al., Stem Cells. 2011; 29(11):1817-28) and SOX1-OT (Ahmad A et al., Cell Mol Life Sci. 2017; 74(22):4245-58), contained these such loci and these account for only ˜14% of the total number of genes in this group (n=29).

To search for biological themes within the data, Ingenuity Pathways Analysis (IPA) was performed to search for ontological patterns that characterize the sample-specific differentially methylated regions. The most significant canonical pathway identified was “axonal guidance”, containing a number of differentially methylation loci including those that map to ARHGEF7, HKR1, ITGB1, PLCH2, PRKCZ, RGS3, WNT8A. All 45 “top diseases and bio-functions” identified by IPA were related to neurobiology with the top 3 consisting of the following: 1) branching of neuritis, 2) development of neurons, and 3) developmental process of synapse (Table 8).

TABLE 8 Top “diseases and bio-functions” identified by Ingenuity Pathways Analysis software for loci whose methylation levels differ significantly between cfDNA from plasma and CSF Diseases or Functions # Categories Annotation p-Value Molecules Molecules Cell Morphology, Cellular Assembly and Branching of 9.24E−04 ADAP1, 5 Organization, Cellular Development, Cellular neurites ARHGEF7, Function and Maintenance, Cellular Growth ITGB1, and Proliferation, Embryonic Development, RAPGEF2, Nervous System Development and Function, RYK Organismal Development, Tissue Development Cellular Development, Cellular Growth and Development of 1.26E−03 ADAP1, 8 Proliferation, Nervous System Development neurons ARHGEF7, and Function, Tissue Development ITGB1, PDZRN3, PRKCZ, RAPGEF2, RYK, SUN1 Cell-To-Cell Signaling and Interaction, Cellular Developmental 1.74E−03 ARHGEF7, 4 Assembly and Organization, Cellular process of synapse ITGB1, Development, Cellular Function and PDZRN3, Maintenance, Cellular Growth and SUN1 Proliferation, Nervous System Development and Function, Tissue Development Cellular Movement, Nervous System Delay in migration 1.82E−03 ITGB1 1 Development and Function of Schwann cells Cell Morphology, Cellular Assembly and Polarization of 1.82E−03 PRKCZ 1 Organization, Nervous System Development axons and Function Cell Morphology, Nervous System Polarization of 1.82E−03 PRKCZ 1 Development and Function astrocytes Cellular Assembly and Organization, Cellular Segregation of 1.82E−03 ITGB1 1 Movement, Nervous System Development sensory axons and Function Cell Morphology, Cellular Assembly and Arborization of 1.82E−03 ITGB1 1 Organization, Cellular Development, Cellular spinal nerve Function and Maintenance, Cellular Growth and Proliferation, Nervous System Development and Function, Organismal Development, Tissue Development Cellular Movement, Nervous System Migration of 2.40E−03 ITGB1, 2 Development and Function granule cells RGS3 Cellular Movement, Nervous System Migration of 2.47E−03 ITGB1, 4 Development and Function neurons RAPGEF2, RGS3, SOX1 Cell Morphology, Cellular Assembly and Dendritic 2.92E−03 ADAP1, 4 Organization, Cellular Development, Cellular growth/branching ARHGEF7, Function and Maintenance, Cellular Growth ITGB1, and Proliferation, Embryonic Development, RAPGEF2 Nervous System Development and Function, Organismal Development, Tissue Development Cellular Development, Cellular Growth and Expansion of 3.63E−03 ITGB1 1 Proliferation, Nervous System Development satellite cells and Function Nervous System Development and Function, Adhesion of brain 3.63E−03 ITGB1 1 Tissue Development tissue Cell Morphology, Cellular Assembly and Neuritogenesis 5.13E−03 ADAP1, 6 Organization, Cellular Development, Cellular ARHGEF7, Function and Maintenance, Cellular Growth ITGB1, and Proliferation, Nervous System PRKCZ, Development and Function, Organismal RAPGEF2, Development, Tissue Development RYK Cell-To-Cell Signaling and Interaction, Cellular Chemoattraction of 5.44E−03 RGS3 1 Movement, Nervous System Development granule cells and Function Cellular Movement, Nervous System Chemotaxis of 5.44E−03 RGS3 1 Development and Function cerebellar granule cell Cell Morphology, Cellular Assembly and Formation of 5.44E−03 PRKCZ 1 Organization, Cellular Development, Cellular supernumerary Function and Maintenance, Cellular Growth axons and Proliferation, Nervous System Development and Function, Organismal Development, Tissue Development Cell Morphology, Cellular Assembly and Extension of 6.77E−03 ITGB1, 3 Organization, Cellular Function and neurites PRKCZ, Maintenance, Nervous System Development RAPGEF2 and Function Cellular Assembly and Organization, Cellular Retraction of 6.79E−03 ARHGEF7, 2 Compromise, Nervous System Development neurites RYK and Function Nervous System Development and Function, Abnormal 9.05E−03 SUN1 1 Neurological Disease, Organ Morphology, morphology of Organismal Development, Organismal Injury cerebellum fissure and Abnormalities Cell Cycle, Nervous System Development and Mitogenesis of 9.05E−03 PRKCZ 1 Function nervous tissue cell lines Cellular Development, Cellular Growth and Proliferation of 1.26E−02 ITGB1 1 Proliferation, Nervous System Development satellite cells and Function Cellular Movement, Endocrine System Migration of 1.26E−02 ITGB1 1 Development and Function, Nervous System lutenizing hormone- Development and Function releasing hormone neurons Cell-To-Cell Signaling and Interaction, Binding of microglia 1.26E−02 ITGB1 1 Hematological System Development and Function, Immune Cell Trafficking, Inflammatory Response, Nervous System Development and Function Cellular Assembly and Organization, Cellular Retraction of axons 1.44E−02 RYK 1 Compromise, Nervous System Development and Function Cell-To-Cell Signaling and Interaction, Binding of Schwann 1.44E−02 ITGB1 1 Nervous System Development and Function cells Nervous System Development and Function Analgesic tolerance 1.62E−02 ARRB2 1 Cellular Assembly and Organization, Nervous Sorting of axons 1.80E−02 ITGB1 1 System Development and Function Cell Morphology, Cellular Assembly and Extension of 1.80E−02 RAPGEF2 1 Organization, Cellular Development, Cellular dendrites Function and Maintenance, Cellular Growth and Proliferation, Embryonic Development, Nervous System Development and Function, Organismal Development, Tissue Development Nervous System Development and Function Myelination of 1.80E−02 ITGB1 1 peripheral nervous system Cell Morphology, Cellular Assembly and Size of dendritic 1.98E−02 ITGB1 1 Organization, Nervous System Development trees and Function, Tissue Morphology Cell Morphology, Cellular Assembly and Morphogenesis of 2.16E−02 ARHGEF7 1 Organization, Cellular Development, Cellular dendritic spines Function and Maintenance, Cellular Growth and Proliferation, Embryonic Development, Nervous System Development and Function, Organismal Development, Tissue Development Nervous System Development and Function, Quantity of radial 2.69E−02 ITGB1 1 Tissue Morphology glial cells Cell-To-Cell Signaling and Interaction, Synaptic 2.69E−02 PRKCZ 1 Nervous System Development and Function transmission of CA1 neuron Cellular Movement, Nervous System Migration of 2.87E−02 RGS3 1 Development and Function cerebellar granule cell Cellular Movement, Nervous System Migration of 3.04E−02 ITGB1 1 Development and Function oligodendrocytes Cellular Assembly and Organization, Nervous Complexity of 3.04E−02 ITGB1 1 System Development and Function dendritic trees Cell Morphology, Cell-To-Cell Signaling and Loss of synapse 3.40E−02 ITGB1 1 Interaction, Cellular Assembly and Organization, Nervous System Development and Function, Neurological Disease, Tissue Morphology Cell Morphology, Cellular Assembly and Neuritogenesis of 3.75E−02 ITGB1 1 Organization, Cellular Development, Cellular pheochromocytoma Function and Maintenance, Cellular Growth cell lines and Proliferation, Nervous System Development and Function, Organismal Development, Tissue Development Nervous System Development and Function Chemically-elicited 3.75E−02 ARRB2 1 antinociception Cell Death and Survival, Nervous System Survival of 3.92E−02 ITGB1 1 Development and Function oligodendrocytes Embryonic Development, Nervous System Formation of brain 4.30E−02 RAPGEF2, 4 Development and Function, Organ RYK, Development, Organismal Development, SOX1, Tissue Development SUN1 Auditory and Vestibular System Development Abnormal 4.44E−02 SUN1 1 and Function, Auditory Disease, Cell morphology of Morphology, Nervous System Development outer hair cells and Function, Neurological Disease, Organ Morphology, Organismal Development, Organismal Injury and Abnormalities, Tissue Morphology Cell Morphology, Cellular Assembly and Formation of 4.79E−02 ITGB1, 2 Organization, Cellular Development, Cellular dendrites RAPGEF2 Function and Maintenance, Cellular Growth and Proliferation, Nervous System Development and Function, Organismal Development, Tissue Development

To further explore the functional significance of the data, Example Embodiment D used the Genotype-Tissue Expression (GTEx) project database. GTEx is an on-going effort to build a comprehensive public resource to study tissue-specific gene expression and regulation. Example Embodiment D identified genes whose expressions are low in whole blood and highly expressed in neuronal tissues. It was rationalized that leukocytes are the major (though probably not exclusive) contributors to cfDNA in plasma whereas neuronal tissues are the primary contributors to cfDNA in CSF. Thus, by identifying tissue-specific differences in gene expression in this context, it was hoped to illuminate the DNA methylation differences identified between cfDNA from plasma and CSF. As expected, a significant number of the differentially methylated regions overlap with genes whose expressions are high in the brain and low in whole blood. Examples of these genes are shown in Table 9 and FIG. 32.

TABLE 9 Differentially methylated regions that overlap with genes whose expressions are high in the brain and low in whole blood. Difference % % % Methylation Chromo Start Methylation Methylation Plasma Gene some Coordinate Plasma CSF Minus CSF p Value Name chr7 158915201 65.14% 83.05% −17.91%  5.72E−07 WDR60 chr3 126345501 84.19% 98.46% −14.27%  2.91E−07 KLF15 chr21 45634201 88.06% 99.15% −11.09%  1.16E−21 PCBP3 chr19 37293451 84.02% 69.00% 15.02% 7.36E−06 HKR1 chr8 139704801 16.59%  1.16% 15.42% 1.01E−12 KCNK9 chr19 37293351 84.57% 69.00% 15.57% 3.14E−06 HKR1 chr19 37293401 85.09% 69.00% 16.09% 1.46E−06 HKR1 chr19 37293301 86.43% 69.89% 16.54% 1.56E−06 HKR1 chr10 133113251 90.10% 69.80% 20.30% 1.20E−07 ADGRA1 chr7 158915451 67.25% 46.36% 20.88% 1.65E−05 WDR60 chr19 37293501 81.03% 54.55% 26.49% 1.10E−06 HKR1 chr10 133113051 84.77% 58.10% 26.68% 5.04E−08 ADGRA1 chr3 40076901 77.97% 50.30% 27.67% 6.05E−19 MYRIP chr3 40077051 78.19% 49.84% 28.34% 6.86E−20 MYRIP chr3 40076951 79.55% 50.30% 29.25% 2.94E−22 MYRIP chr3 40077001 80.28% 50.30% 29.98% 1.27E−23 MYRIP chr3 40077101 73.45% 42.98% 30.47% 3.83E−17 MYRIP chr14 89161951 85.57% 48.05% 37.52% 1.35E−10 FOXN3 chr14 89162001 85.57% 48.05% 37.52% 1.35E−10 FOXN3 chr14 89161901 86.40% 48.05% 38.35% 4.91E−11 FOXN3 chr14 89161851 87.29% 48.05% 39.24% 1.85E−11 FOXN3 chr4 159421451 68.88% 26.56% 42.32% 2.83E−11 RAPGEF2 chr14 89161801 90.92% 48.05% 42.87% 1.79E−13 FOXN3 chr4 159421401 72.70% 26.56% 46.13% 1.09E−13 RAPGEF2 chr4 159421501 72.82% 26.56% 46.25% 5.82E−14 RAPGEF2 chr4 159421551 72.82% 26.56% 46.25% 5.82E−14 RAPGEF2 chr4 159421601 73.76% 26.56% 47.20% 3.73E−14 RAPGEF2 chr21 45634451 77.96% 29.60% 48.36% 3.26E−30 PCBP3 chr3 126345251 79.17% 28.57% 50.60% 2.04E−13 KLF15

Of twenty-nine windows that map to genes whose expressions are at least 5-fold higher in brain than whole blood, 26 were found to significantly hypomethylated in cfDNA from CSF compared to plasma. Examples of genes displaying elevated gene expression and altered DNA methylation in brain versus whole blood include KLF15, which is thought to play a critical role in the maintenance of neural stem cells at late embryonic stages and functions as a transcriptional activator to promote dopamine D2 receptor expression in neurons (Ohtsuka T et al., Stem Cells. 2011; 29(11):1817-28; Zhou J et al., Biochem Biophys Res Commun. 2017; 492(2):269-74). KCNK9, which encodes the protein TASK-3 (a potassium channel protein containing a two pore-forming P domain) and is highly expressed in the cerebellum. The synaptic protein encoding gene ADGRA1 (Pandya N J et al., Sci Rep. 2017; 7(1):12107), which is highly expressed in the frontal cortex. MYRIP, which encodes a scaffolding protein involved in exocytosis, and is expressed most highly in the amygdala, anterior cingulate cortex and cerebellum. The transcriptional repressor FOXN3, which is very highly expressed in the cerebellum. RAPGEF2, which is involved in cerebral cortex development and D1 Dopamine receptor-dependent ERK phosphorylation in brain (Jiang S Z et al., eNeuro. 2017; 4(5); Ye T et al., Nat Commun. 2014; 5:4826).

Discussion of this Example

The presently disclosed subject matter of this Example relates to a comprehensive quantitative genome-wide analysis of DNA methylation in human cerebrospinal fluid. The presently disclosed methods and the resulting data of this Example demonstrate that epigenomic liquid biopsy of the human central nervous system can be used for molecular phenotyping of brain DNA methylation signatures.

The presently disclosed subject matter of this Example reveals differences in the physical properties of cfDNA from CSF compared to that form plasma, with fragments of the former existing in a notably shorter state. This is reminiscent of cfDNA from the fetus during pregnancy which is known to be represented by fragments of lower molecular weight than maternal cfDNA (Yu S C et al., Proc Natl Acad Sci USA. 2014; 111(23):8583-8). The presently disclosed subject matter of this Example further identified clear periodicity in fragment size of cfDNA in CSF which suggests the presence of a nucleosome footprint that provides further information about the molecular phenotype of the CNS as has been suggested in the context of plasma cfDNA (Teo Y V et al., Aging Cell. 2019; 18(1):e12890; Ulz P et al., Nat Genet. 2016; 48(10):1273-8).

Further evidence that information relating to the molecular phenotype of the CNS may be obtained via epigenomic analysis of CSF cfDNA comes from the observation that DNA methylation signatures showed clear differences between cfDNA from CSF and plasma. Direct comparison of specific loci to identify differentially methylated CpG sites revealed that CSF-derived fragments were generally more likely to me less methylated than their plasma-derived counterparts. Furthermore, differentially methylated sites appeared to be correlated with tissue-specific gene expression patterns of the CNS.

Example Embodiment D reveals notable differences in the global distribution of DNA methylation between cfDNA from CSF and plasma, with fewer numbers of CpG sites existing in an intermediately methylated state (20-80%) in CSF versus plasma. This may reflect the fact that cfDNA in CSF is derived from relatively fewer distinct cell lineages than cfDNA in plasma which presumably consists of multiple contributions from a large and diverse range of different organ systems and cell-lineages. This is logical if one considers that, in its simplest form, a haploid CpG site in a single cell exists in either a state of complete hyper or hypo methylation. Thus, the % methylation or methylation rate at a given CpG site in a population of cfDNA fragments represents some complex combination of this binary state that is likely the result of the contributions of many millions or even billions of cells.

Although DNA methylation is, at some level, an indirect corollary to gene expression in the context of non-invasive molecular phenotyping, it has the distinct advantage of being relatively stable compared to cell-free RNA and thus represents a potentially attractive substrate for analysis. Sample stability means that any future clinical analysis of DNA methylation in cfDNA from CSF could be centralized in a specialized testing laboratory following sample transit.

One advantage of the presently disclosed method of this Example is its targeted nature. The solution-phase hybridization of ˜80 Mb of the human genome enables a systematic analysis of known structural genes and regulatory elements at a read depth that is higher relative to cost than could be achieved by whole genome shotgun bisulfite sequencing. Even though the yield of cfDNA per mL of CSF was lower than that of plasma, the recent emergence of new approaches to DNA methylation analysis at single base resolution will likely increase the efficiency of these assays and enable the generation of richer data sets from many more individual samples that reflect a range of pathobiological and normal phenotypes.

The presently disclosed subject matter of this Example relating to epigenomic liquid biopsy can be used for the molecular profiling of CNS phenotypes in a variety of research and clinical settings.

Example Embodiment E of Example 2: Estimation of Abnormal Spinal Fluid Methylome Variation in Targeted Regions for Diagnosis of CNS Abnormality

Provided below is an algorithm that can be used to diagnose a subject with a CNS disorder. The presently disclosed subject matter of this Example Embodiment E provides that the methylome(s) of the central nervous, or structures therein, could be affected by certain abnormalities, and that the changes of these methylomes can lead to changes in the methylation patterns of the DNA fragments found in maternal cerebrospinal fluid (CSF), which are released by CNS tissues. An algorithm was developed to identify the changes of methylation patterns in the methylome of CSF caused by CNS phenotypes. The main insight behind this algorithm of this Example was that the methylome of the DNA fragments in CSF is a mixture of a variety of component methylomes of CNS origin, and that the proportion of these different component methylomes in the mixture varies from subject to subject, even among the population with normal CNS phenotype. By constructing a model of CSF methylome as a linear combination of various component methylomes of CNS origins, the algorithm of this Example can accurately predict the methylation patterns of a new CSF sample under the hypothesis that it is from a normal individual. Consequently, the algorithm exhibited high sensitivity in detecting abnormal methylation patterns in a CSF sample caused by changes of the methylomes of some CNS tissues when the sample is from an affected individual.

Let i be any CpG site in human genome, z_(i,j) be the methylation level of CpG site i in a CSF sample j, p_(i,r,j) be the proportion of the r^(th) component methylome m_(r,j) of CNS origin in maternal plasma sample j at site i, m_(i,r,j) be the methylation level of CpG i in methylome m_(r,j). The hypothesis is:

Z_(i,j)=Σ_(r=1) ^(R)p_(i,r,j)m_(i,r,j)  (1)

where p_(i,r,j), m_(i,r,j)>=0, m_(i,r,j)<=1, p_(i,1,j)+ . . . +p_(i,R,c)=1.

It is further assumed that there is a set of CpG sites S such that, for any CpG site i in S, and any CSF j from a normal individual, it has m_(I,r,j)=m_(I,r) and p_(I,r,j)=p_(r,j).

That is, it is assumed that in any CSF sample from a normal individual, the proportions of different component methylomes in the mixture are the same for all CpG sites in S. It is also assumed that, by restricting to the set of CpG sites S, CSF samples from all normal individuals have the same set of component methylomes. They are called restricted reference component methylomes (RRCM), and are labeled as m₁ ^(S), . . . , m_(R) ^(S), or simply m₁, . . . , m_(R) when there is no confusion. For any CSF sample j from a normal individual, its methylome restricted to set of CpG sites in S can be expressed as a weighted average of the restricted reference component methylomes. More precisely, let z_(j) ^(S) be the methylome of CSF sample C restricted to S, then for some mixture vector p_(j)=[p_(j,1) . . . , p_(j,R)]^(T), it has:

z_(j) ^(s)=[m₁ ^(S), . . . ,m_(R) ^(S)]p_(j)  (2)

Finally, it is assumed that the set S is the union of two disjoint subsets C and T, where T is a union of K non-empty sets T_(k) such that T=U_(k=1) ^(K)T_(k) where the index k represents the k^(th) type of abnormal CNS phenotype. T_(k)'s do not need to be disjoint. Moreover, T_(k) itself is the union of two disjoint sets D_(k) and V_(k). Either D_(k) or V_(k) could be empty, but not both. It is assumed that for any CSF sample, including one from an abnormal individual, when restricted to CpG sites in C, its methylome can always be expressed as a weighted average of the restricted reference component methylomes. That is, it has: z_(j) ^(C)=[m₁ ^(C), . . . , m_(R) ^(C)]p_(j) regardless whether j is from an abnormal individual. C is called the set of reference CpG sites. On the other hand, for a CSF sample l from an abnormal individual, when restricted to CpG sites in S=C∪T, its methylome can no longer be expressed as a weighted average of the restricted reference component methylomes. That is, it has: w₁ ^(S)≠[m₁ ^(S), . . . , m_(R) ^(S)]p_(l) for any mixture vector p_(l). More specifically, for a CSF sample l from the k^(th) type of abnormal individual, it has: 1), w_(j) ^(C)=[m₁ ^(C), . . . , m_(R) ^(C)]p_(l), 2), if D_(K) is non-empty, then w_(l) _(D) _(K)=[m_(1,k) ^(D) ^(k) , . . . , m_(R,k) ^(D) ^(k) ]p_(l) such that [m₁ ^(D) ^(k) , . . . , m_(R) ^(D) ^(k) ]≠[m_(1,k) ^(D) ^(k) , . . . , m_(R,k) ^(D) ^(k) ], and 3), if V_(k) is non-empty, then w_(l) ^(V) ^(k) =[m₁ ^(V) ^(k) , . . . , m_(R) ^(V) ^(k) ]q_(l) such that p_(l)≠q_(l). In other words, in a CSF sample from the k^(th) type of abnormal individual, if the set D_(k) is not empty, the component methylomes of the sample l restricted to D_(k) are no longer the same as the reference component methylome restricted to D_(k). If the set V_(k) is not empty, in this CSF sample, the proportion of the reference component methylomes restricted to V_(k) is no longer the same as the proportion of the reference component methylome restricted to R.

T is called the target set of CpG sites, D_(k) is called the differential methylation target set, V_(k) is called the copy number variation target set, and T_(k) is called the target set for the k^(th) type of abnormal individual.

The main steps of the presently disclosed algorithm are:

-   -   1) Identify the sets of reference CpG sites C, and T₁, . . . ,         T_(K) for the list of K types of abnormal individuals.     -   2) Estimate the restricted reference component methylomes m₁, .         . . , m_(R), or R predictor methylomes n₁, . . . , n_(R) that         are independent linear combinations of the reference component         methylomes such that n_(r)=[m₁, . . . , m_(R)]q_(r) for R         linearly independent mixture vectors q₁, . . . , q_(R).     -   3) (Optional) If the reference component methylomes are         available, estimate the proportions of these components at the         reference CpG sites C for the test CSF samples.     -   4) Predict the methylation level of the test CSF samples at the         target set T_(k) of CpG sites, under the hypothesis that the         sample is from a normal individual.     -   5) Compare the predicted methylation levels at D_(k) and V_(k)         against the observed methylation levels, and reject the null         hypothesis that a test sample is from a normal individual if the         observed methylation levels are significantly different form the         predicted levels.

The presently disclosed algorithm of this Example can be implemented in a variety of ways. For example, given the methyl-seq data for a set of CSF samples from normal individuals, the presently disclosed EM algorithm or the data augmentation method can be applied to estimate the component methylomes, then use the maximum likelihood method to estimate the proportion of these component methylomes in the test sample. Below are exemplary simple implementations of the presently disclosed algorithm that use linear regression.

In the first simple implementation of the presently disclosed algorithm of this Example, it is assumed the restricted methylome of a CSF sample from a normal individual can be approximated by a mixture of two restricted reference methylomes, one representing the DNA fragments from a first specific CNS region, another representing the DNA fragments from a second specific CNS region. It is further assumed that the estimations of these two reference component methylomes are available. For example, in the implementation below, the methylome of oligodendrocytes is used as an approximation to the oligodendrocyte methylome in the CSF sample, and the methylome of neuronal cell samples from healthy individuals (HI) is used as an approximation to the neuronal methylome in the CSF sample. The implementation of the algorithm includes the following steps:

-   -   1. Identify the reference set C, and the target sets T₁, . . . ,         T_(K).         -   1.1 Collect the methylation data for a set of             oligodendrocyte samples, a set of neuronal cell samples, and             a set of CSF samples, all from normal individuals. For each             type of abnormal individuals, collect a set of             oligodendrocyte samples, a set of neuronal cell samples, and             a set of CSF samples from that type of abnormal individuals.             All these samples should have matched age, race, and other             relevant parameters. These are the training data.         -   1.2 Let x_(i,j) be the observed methylation level of CpG             site i in a normal oligodendrocyte sample j, and y_(i,l) the             observed methylation level of CpG site i in a normal             neuronal cell sample l, s_(x,i) ² the sample variance of             x_(i,j) over all normal oligodendrocyte samples, s_(y,i) ²             the sample variance of y_(i,j) over all normal neuronal cell             samples. Identify the CpG sites S₀ such that for any i∈S₀,             it has both s_(x,i) ²<c₀ and s_(y,i) ²<c₀ for some constant             c₀. These are CpG sites with stable methylation levels in             each type of normal cells.         -   1.3 Let x_(i,j) be the observed methylation level of CpG             site i in an oligodendrocyte sample j, including normal and             abnormal, and y_(i,l) the observed methylation level of CpG             site i in a neuronal cell sample l, including normal and             abnormal, s_(x,i) ² the sample variance of x_(i,j) over all             oligodendrocyte samples, including normal and abnormal,             s_(y,i) ² the sample variance of y_(i,j) over all neuronal             cell samples, including normal and abnormal. Identify the             CpG sites S₁ such that for any i∈S₁, it has both s_(x,i)             ²<c₀ and s_(y,i) ²<c₀ for some constant c₀, and that the             statistical test for the difference between {x_(i,j0): j0 is             a normal oligodendrocyte sample} and {x_(i,j2): jk is an             abnormal oligodendrocyte sample of type k} is not             significant for all abnormal types of oligodendrocyte, and             that the statistical test for the difference between             {y_(i,j0): j0 is a normal neuronal cell sample} and             {y_(i,j2): jk is an abnormal neuronal cell sample of type k}             is not significant for all abnormal types of neuronal cell.             These are CpG sites with stable methylation levels in each             type of cells, and with no difference in methylation level             between normal and any abnormal samples. Let x_(i) be the             sample mean of x_(i,j) over all oligodendrocyte samples,             including normal and abnormal, y_(i) the sample mean of             y_(i,j) over all neuronal cell samples, including normal and             abnormal. Identify the subset C₀ of S₁ such that for any             i∈C₀, it has |x_(i)−y_(i)|>c₁ for some constant c₁. These             are CpG sites that are stably methylated in each cell type,             with no difference between the normal and abnormal samples             of the same cell type, and differentially methylated between             different types of cells.         -   1.4 Let x^(R) ⁰ be the vector of x_(i) for all i∈C₀, and             y^(C) ⁰ be the vector of y_(i) for all i∈C₀, where x_(i) is             the mean methylation at site i in all oligodendrocyte             samples y_(i) the mean methylation at site i in all neuronal             cell samples. Note that by the way the set C₀ is selected,             there is no difference in the methylation level of any CpG             sites in C₀ between normal and abnormal oligodendrocyte             samples, or between normal and abnormal neuronal cell             samples. Let z_(j) ^(C) ⁰ be the observed methylation levels             of CpG sites in C₀ for a CSF sample j of the k^(th) abnormal             type. (For convenience, the normal CSF sample is called as             sample of the 0^(th) abnormal type). For each sample j             belonging to the k^(th) abnormal type, regress z_(j) ^(C) ⁰             against x^(C) ⁰ and y^(C) ⁰ , with the constraints that the             intercept must be 0, and the coefficients must be             non-negative and add to 1, and get the residual e_(j) ^(C) ⁰             . Identify the subset C₀ ^(k) of C₀ such that for any CpG i             in C₀ ^(k), it has

${\frac{e_{i,k}^{2}}{s_{i,k}} < c_{2}},$

-   -   -   and e_(i,k) ²<c₃ for some constants c₂ and c₃, where e_(i,k)             ² is the mean of the squared difference between estimated             and observed methylation levels of CpG site i in all CSF             samples of the k^(th) abnormal type, and s_(i,k) ² the             sample variances of methylation levels of CpG site i in the             same set of CSF samples. Repeat the above procedure for each             type of abnormal CSF samples, the intersection of the             subsets C=∩_(k=0) ^(K) C₀ ^(k) is the reference set of J CpG             sites. These are CpG sites where their methylation levels in             both normal and any type of abnormal CSF samples can be             accurately predicted by the reference component methylomes             from normal individuals.         -   1.5 Let T₀=S₀\S₁. Let x^(C) and x^(T) ⁰ be the vectors of             x_(i) and x_(h) for all i∈C and h∈T₀ respectively, and y^(C)             and y^(T) ⁰ be the vectors of y_(i) and y_(h) for all i∈C             and h∈T₀ respectively, where x_(i), x_(h), y_(i), and y_(h)             are mean methylation level of sites for a normal             oligodendrocyte or neuronal cell at sites i and h             respectively. Let z_(j) ^(C) and z_(j) ^(T) ⁰ and be the             observed methylation levels of CpG sites in C and T₀             respectively for a normal CSF sample j, w_(l) _(k) ^(C) and             w_(l) _(g) ^(T) ⁰ the observed methylation level of CpG             sites in C and T₀ respectively for a CSF sample l_(k) from             an individual with the k^(th) type of abnormality, w_(l)             _(g) ^(C) and w_(l) _(g) ^(T) ⁰ the observed methylation             level of CpG sites in C and T₀ respectively for a CSF sample             l_(g) from an individual with the g^(th) type of             abnormality, where g≠k. For each j, l_(k), and l_(g),             regress z_(j) ^(C), w_(l) _(k) ^(C), and w_(l) _(g) ^(C)             respectively against x^(C) and y^(C), with the constraints             that the intercept must be 0, and the coefficients must be             non-negative and add to 1. Apply the fitted models             respectively to x^(T) ⁰ and y^(T) ⁰ to predict z_(j) ^(T) ⁰             , w_(l) _(k) ^(T) ⁰ , and w_(l) _(g) ^(T) ⁰ respectively,             and get the differences e_(j) ^(T) ⁰ , e_(l) _(k) ^(T) ⁰ and             e_(l) _(g) ^(T) ⁰ between the predicted values and observed             values. Let e_(i), e_(i,k), and e_(i,g) be the means of the             sets of differences {e_(j) ^(i)∈e_(j) ^(T) ⁰ : j is a normal             CSF sample}, {e_(l) _(k) ^(T) ⁰ : l_(k) is a CSF sample of             th k^(th) abnormal type} and {e_(l) _(g) ^(T) ⁰ : l_(g) is a             CSF sample of the g^(th) abnormal type} for CpG site i             respectively. Identify the subset T_(k) of T₀ such that for             any i∈T_(k), it has |e_(i)|<c_(2,0), |e_(ik)|>c_(2,k), and             |e_(ik)−e_(ig)|>c_(3,k), for some constants c_(2,0),             c_(2,k), and c_(3,k), for all g≠k. T_(k) is the target set             for the k^(th) type of the abnormal individual. These are             the sites where the methylation of a normal CSF sample can             be accurately predicted, the observed methylation in a CSF             sample of the k^(th) abnormal type will deviate from the             prediction, and deviation will be different from that of a             CSF sample of any other abnormal type.

    -   2. Estimate fraction of the new CSF samples to be tested. Recall         that x^(c) and y^(c) are mean vectors of the methylation levels         of the training oligodendrocyte and training neuronal cell data         for the CpG sites in the reference set C. For any new CSF sample         t to be tested, let z_(t) ^(C) be the observed methylation         levels of CpG sites in C. Regress z_(t) ^(C) against x^(C) and         y^(C), with the constraints that the intercept must be 0, and         the coefficients must be non-negative and add to 1. The         estimated coefficient for x^(C) is the estimated oligodendrocyte         fraction for the CSF sample t.

    -   3. Test if the new CSF samples are from the k^(th) type of         abnormal individual. For the new CSF sample t, let x^(T) ^(k)         and y^(T) ^(k) be mean vectors of the methylation levels of the         training oligodendrocyte and training neuronal cells data for         the CpG sites in the target set T_(k) identified in step 1 of         this algorithm, apply the fitted regression models obtained from         the step 2 of this algorithm to X^(T) ^(k) and y^(T) ^(k) to         predict the methylation levels of CpG sites in T_(k) for sample         t under the hypothesis that sample t is from a normal pregnancy.         Let n_(k) be the number of CpG sites in T_(k). Define functions

f_(k)(x₁, … , x_(nk)) = Σ_(i)(−1)^(I_(e_(i_(k)) − e_(i)))x_(i)  and  f_(k, g)(x₁, … , x_(nk)) = Σ_(i)(−1)^(I_(e_(i_(k)) − e_(i_(g))))x_(i),

-   -   where I_(⋅)=I_((−∞,0))(⋅), that is, the indicator function for         the interval (−∞, 0), e_(i), e_(ik) and e_(ig) are estimations         obtained from step 1.5 of the algorithm. It will be said the         sample is from the k^(th) type of abnormal individual if         f_(k)(e₁ _(t) −e₁, . . . , e_(n) _(k,t) −e_(n) _(k) )>c_(4,k),         and f_(k,g) (e₁ _(t) −e₁ _(g) , . . . , e_(n) _(k,t) −e_(n)         _(k,g) )>c_(5,g) for all g≠k, where e_(i) _(t) is the difference         between the observed methylation level of the CpG site i∈T_(k)         for sample t and the predicted value by the fitted model         obtained from step 2, and g is any type of abnormal individual         that is different form the k^(th) type of abnormal individual.

Other ways of implementing the presently disclosed algorithm of this Example can be developed by modifying the simple implementation presented above. Specifically, it does not need to assume that there are only two component reference methylomes that make up the CSF methylomes, nor does it need to approximate them by the oligodendrocyte and HI methylomes. Instead, a set of predictor methylomes can be collected that are mixtures of component reference genomes, as long as the number of the predictor methylomes is the same as the number of the reference component methylomes, and the mixture vectors of the predictor methylomes are linearly independent. For example, they can be methylomes of CSF samples with known different proportion of oligodendrocyte and neuronal cell DNAs.

In the presently disclosed algorithm of this Example, the difference between observed methylation levels in certain target regions and the predicted methylation levels as the test statistic to determine if in a CSF sample the methylome has been affect by some type of CNS abnormality. To illustrate the advantage of this approach, it is assumed that the mixture vector p_(i) for the methylome of a normal CSF sample j followed a Dirichlet's distribution with parameters α_(i)= . . . =α_(R). Furthermore, for CpG site i, its methylation levels in the R reference vector p_(j) for component methylomes are m_(i,r)=(r−1)/(R−1). It can be shown that the methylation level of i in sample j then has a mean of 0.5, and a variance of

$\frac{R + 1}{12\left( {R - 1} \right)\left( {{R\;\alpha_{1}} + 1} \right)}.$

If there is a methyl-seq library of in sample j with a coverage of N for CpG site i, the variance of the measured methylation level z_(i,j) is

$\sigma_{1}^{2} = {\frac{1}{4N} + {\frac{N - 1}{N}{\frac{R + 1}{12\left( {R - 1} \right)\left( {{R\;\alpha_{1}} + 1} \right)}.}}}$

In other words, if z_(i,j) is used as a test statistic to detect abnormal CNS using CSF sample, under the null hypothesis, the test statistic has a variance of σ₁ ². However, in the presently disclosed algorithm of this Example, it is first estimated the mixture vector p_(i), then predict z_(i,j) by Σ_(r)m_(i,r)p_(r,j). Note that in a methyl-seq data, it can get millions of CpG sites covered in each library, and that the variance of the coefficients in a linear regression model is inversely proportional to sample size. Thus it is possible to get highly accurate estimation of the mixture vector p_(i), even if it is taken into account that adjacent CpG sites tend to have correlated methylation levels. Assuming an accurate estimate of Σ_(r)m_(i,r)p_(r,j) can be obtained, the variance of the difference z_(i,j)−Σ_(r)m_(i,r)p_(r,j) between the observed methylation level and the prediction will be

$\frac{1}{4N} - {\frac{1}{N}{\frac{R + 1}{12\left( {R - 1} \right)\left( {{R\;\alpha_{1}} + 1} \right)}.}}$

In other words, under the null hypothesis, the test static z_(i,j)−Σ_(r) M_(i,r) p_(r,j) used in the presently disclosed algorithm has a much smaller variance than the other candidate test statistic z_(i,j). This in turns means that the presently disclosed test will achieve a higher power at the same level of type I error.

Although the presently disclosed subject matter of this Example and its advantages have been described in detail, it should be understood that various changes, substitutions and alterations can be made herein without departing from the spirit and scope of the invention. Moreover, the scope of this Example is not intended to be limited to the particular embodiments of the process, machine, manufacture, and composition of matter, means, methods and steps described in the specification. As one of ordinary skill in the art will readily appreciate from the invention of the presently disclosed subject matter, processes, machines, manufacture, compositions of matter, means, methods, or steps, presently existing or later to be developed that perform substantially the same function or achieve substantially the same result as the corresponding embodiments described herein may be utilized according to the presently disclosed subject matter. Accordingly, the appended claims are intended to include within their scope such processes, machines, manufacture, compositions of matter, means, methods, or steps.

Example of Embodiments for Example 2

B1. A method for diagnosing, prognosing, classifying and/or monitoring a CNS disorder in a subject comprising:

(a) obtaining a cerebrospinal fluid sample from the subject;

(b) determining the methylation status and/or level of one or more genomic loci in the cerebrospinal fluid sample;

(c) comparing the methylation status and/or level of the one or more genomic loci to a reference; and

(d) diagnosing the CNS disorder in the subject,

wherein the difference in the methylation status and/or level of the one or more genomic loci in the cerebrospinal fluid sample compared to the reference indicates the presence of the CNS disorder in the subject.

B2. The method of embodiment B1, wherein an increase in the level of methylation of the one or more genomic loci in the cerebrospinal fluid sample indicates the presence of the CNS disorder in the subject or a decrease in the level of methylation of the one or more genomic loci in the cerebrospinal fluid sample indicates the presence of the CNS disorder in the subject. B3. The method of embodiment B1, wherein a decrease in the level of methylation of at least one of the one or more genomic loci in the cerebrospinal fluid sample and an increase in the level of methylation of at least one of the one or more genomic loci in the cerebrospinal fluid sample indicates the presence of the CNS disorder in the subject. B4. A method of treating a CNS disorder in a subject comprising:

(a) obtaining a cerebrospinal fluid sample from the subject;

(b) determining the methylation status and/or level of one or more genomic loci present in the cerebrospinal fluid sample;

(c) comparing the methylation status and/or level of the one or more genomic loci to a reference;

(d) diagnosing a CNS disorder in the subject, wherein the difference in the methylation status and/or level of the one or more genomic loci in the cerebrospinal fluid sample compared to the reference indicates the presence of the CNS disorder in the subject; and

(e) treating the subject diagnosed with the CNS disorder. B5. The method of any one of embodiments B1-B4, wherein the reference is the methylation status and/or level of the one or more genomic loci in a cerebrospinal fluid sample obtained from a subject that does not have the CNS disorder. B6. A method of treating a CNS disorder in a subject comprising:

(a) measuring the methylation status and/or level of one or more genomic loci present in a cerebrospinal fluid sample from the subject prior to a treatment of the CNS disorder;

(b) measuring the methylation status and/or level of one or more genomic loci present in a cerebrospinal fluid sample from the subject during the treatment of the CNS disorder;

(c) continuing the treatment if the difference in the methylation status and/or level of the one or more genomic loci between the cerebrospinal fluid samples from prior to and during the treatment of the CNS disorder indicates the subject is responsive to the treatment.

B7. The method of embodiment B6, further comprising (d) administering a different treatment to the subject if the difference in the methylation status and/or level of the one or more genomic loci between the cerebrospinal fluid samples from prior to and during the treatment of the CNS disorder indicates the subject is not responsive to the treatment. B8. The method of embodiment B6, wherein an increase in the level of methylation of the one or more genomic loci in the cerebrospinal fluid sample indicates the subject is responsive to the treatment, or a decrease in the level of methylation of the one or more genomic loci in the cerebrospinal fluid sample indicates the subject is responsive to the treatment. B9. The method of embodiment B6, wherein a decrease in the level of methylation of at least one of the one or more genomic loci in the cerebrospinal fluid sample and an increase in the level of methylation of at least one of the one or more genomic loci in the cerebrospinal fluid sample indicates the subject is responsive to the treatment. B10. The method of embodiment B7, wherein an increase in the level of methylation of the one or more genomic loci in the cerebrospinal fluid sample indicates the subject is responsive to the treatment, or a decrease in the level of methylation of the one or more genomic loci in the cerebrospinal fluid sample indicates the subject is not responsive to the treatment. B11. The method of embodiment B7, wherein a decrease in the level of methylation of at least one of the one or more genomic loci in the cerebrospinal fluid sample and an increase in the level of methylation of at least one of the one or more genomic loci in the cerebrospinal fluid sample indicates the subject is not responsive to the treatment. B12. The method of any one of embodiments B1-B11, wherein the one or more genomic loci comprise one or more CpG sites. B13. The method of any one of embodiments B1-B12, wherein the CNS disorder is selected from the group consisting of brain and spinal cord tumors, brain and spinal cord infections, brain and spinal cord inflammations, neuropsychiatric disorders, and neurodegenerative diseases. B14. The method of any one of embodiments B1-B13, wherein the one or more genomic loci are present within nucleic acids isolated from the cerebrospinal fluid sample. B15. The method of any one of embodiments B1-B14, wherein the one or more genomic loci are present within cell-free nucleic acids isolated from the cerebrospinal fluid sample. B16. A method of treating a CNS disorder in a subject comprising;

(a) diagnosing a CNS disorder in the subject by utilization of the algorithm disclosed in Example Embodiment E; and

(b) treating the subject diagnosed with the CNS disorder.

B17. The method of any one of embodiments B1-B16, wherein the subject is human. B18. A kit for diagnosing, prognosing and/or monitoring a CNS disorder in a subject comprising a means for determining and/or detecting the methylation status of one or more genomic loci. B19. The kit of embodiment B18, wherein the means comprises one or more primers and/or probes for determining and/or detecting the methylation status of the one or more genomic loci.

Abstract of this Example (Example 2)

Example 2 provides methods for diagnosing, prognosing, monitoring, classifying and/or treating central nervous system disorders, e.g., brain and spinal cord tumors, brain and spinal cord infections, brain and spinal cord inflammations, neuropsychiatric disorders, and neurodegenerative diseases. This Example further provides algorithms and kits for diagnosing, prognosing, monitoring, classifying and/or treating central nervous system disorders.

Example 3—Method for Non-Invasive Detection of Fetal Aneuploidy and/or Sub-Chromosomal Fetal Copy Number Variations by Bisulfite Sequencing of Maternal Plasma Field of this Example

The methods disclosed in this Example are related to the field of prenatal diagnosis, specifically to non-invasive methods to detect fetal aneuploidy in a biological sample including maternal plasma DNA.

Background of this Example

Definitive prenatal diagnosis is currently performed via amniocentesis (AF) or chorionic villus sampling (CVS) to obtain fetal or placental cells, respectively. Chromosome analysis is then achieved using conventional karyotyping or, more recently, array comparative genomic hybridization (aCGH). AF and CVS are invasive procedures that have significant risk of miscarriage, fetal morbidity and considerable parental stress and there have been intense efforts to develop non-invasive alternatives.

Summary of this Example

Disclosed in this Example are diagnostic methods that can be used to detect fetal aneuploidy and/or sub-chromosomal fetal copy number variations (e.g., microdeletions and/or microduplications) in maternal plasma while reducing cost and complexity. The method uses DNA methylation signatures identified using biochemical methods with a computational approach to detect fetal aneuploidy and/or sub-chromosomal fetal copy number variations.

Description of this Example

Proof of concept has been demonstrated for the detection of pregnancy related disease via bisulfite sequencing of maternal plasma. One intermediate step of the method includes the estimation of a percentage of DNA fragments in the maternal plasma that originate from the fetus. The method has been modified to detect fetal aneuploidy and/or sub-chromosomal fetal copy number variations.

The approach of this Example involves the targeted methylation of specific regions of genomic DNA such as commonly aneuploid chromosomes and/or copy number variation regions (such as microdeletion regions). Such an approach would involve a parallel strategy in which regions of interest are targeted in a multiplex fashion.

Terms of this Example

Unless otherwise noted, technical terms are used according to conventional usage. Definitions of common terms in molecular biology may be found in Benjamin Lewin, Genes V, published by Oxford University Press, 1994 (ISBN 0-19-854287-9); Kendrew et al. (eds.), The Encyclopedia of Molecular Biology, published by Blackwell Science Ltd., 1994 (ISBN 0-632-02182-9); and Robert A. Meyers (ed.), Molecular Biology and Biotechnology: a Comprehensive Desk Reference, published by VCH Publishers, Inc., 1995 (ISBN 1-56081-569-8).

In order to facilitate review of the various embodiments of this Example, the following explanations of specific terms are provided:

Amplification: To increase the number of copies of a nucleic acid molecule. The resulting amplification products are called “amplicons.” Amplification of a nucleic acid molecule (such as a DNA or RNA molecule) refers to use of a technique that increases the number of copies of a nucleic acid molecule in a sample. An example of amplification is the polymerase chain reaction (PCR), in which a sample is contacted with a pair of oligonucleotide primers under conditions that allow for the hybridization of the primers to a nucleic acid template in the sample. The primers are extended under suitable conditions, dissociated from the template, re-annealed, extended, and dissociated to amplify the number of copies of the nucleic acid. This cycle can be repeated. The product of amplification can be characterized by such techniques as electrophoresis, restriction endonuclease cleavage patterns, oligonucleotide hybridization or ligation, and/or nucleic acid sequencing.

Other examples of in vitro amplification techniques include quantitative real-time PCR; reverse transcriptase PCR (RT-PCR); real-time PCR (rt PCR); real-time reverse transcriptase PCR (rt RT-PCR); nested PCR; strand displacement amplification (see U.S. Pat. No. 5,744,311); transcription-free isothermal amplification (see U.S. Pat. No. 6,033,881); repair chain reaction amplification (see PCT Publication No. WO 90/01069); ligase chain reaction amplification (see European patent publication No. EP-A-320 308); gap filling ligase chain reaction amplification (see U.S. Pat. No. 5,427,930); coupled ligase detection and PCR (see U.S. Pat. No. 6,027,889); and NASBA™ RNA transcription-free amplification (see U.S. Pat. No. 6,025,134), amongst others.

Allele (Haplotype): A 5′ to 3′ sequence of nucleotides found at a set of one or more polymorphic sites in a locus on a single chromosome from a single individual. “Allelic pair” is the two alleles found for a locus in a single individual. With regard to a population, alleles are the ordered, linear combination of polymorphisms (e.g., single nucleotide polymorphisms (SNPs) in the sequence of each form of a gene (on individual chromosomes) that exist in the population. “Haplotyping” is a process for determining one or more alleles in an individual and includes use of family pedigrees, molecular techniques and/or statistical inference. “Haplotype data” or “allele data” is the information concerning one or more of the following for a specific gene: a listing of the allelic pairs in an individual or in each individual in a population; a listing of the different alleles in a population; frequency of each allele in that or other populations, and any known associations between one or more alleles and a trait.

Bisulfite: All types of bisulfites, such as sodium bisulfite, that are capable of chemically converting a cytosine (C) to a uracil (U) without chemically modifying a methylated cytosine and therefore can be used to differentially modify a DNA sequence based on the methylation status of the DNA.

Bisulfite treatment: The treatment of DNA with bisulfite or a salt thereof, such as sodium bisulfite (NaHSO₃). Bisulfite reacts readily with the 5,6-double bond of cytosine, but poorly with methylated cytosine. Cytosine reacts with the bisulfite ion to form a sulfonated cytosine reaction intermediate which is susceptible to deamination, giving rise to a sulfonated uracil. The sulfonate group can be removed under alkaline conditions, resulting in the formation of uracil. Uracil is recognized as a thymine by polymerases and amplification will result in an adenine-thymine base pair instead of a cytosine-guanine base pair.

Cell-free DNA: DNA which is no longer fully contained within an intact cell, for example DNA found in plasma or serum.

Chromosomal abnormality: A chromosome, or a segment of a chromosome, with DNA deletions or duplications, such as chromosomal aneuploidy. The term also encompasses translocation of extra chromosomal sequences to other chromosomes.

Chromosomal aneuploidy or aneuploidy: The abnormal presence (hyperploidy) or absence (hypoploidy) of a chromosome, such as chromosome 13, 18 or 21. In some cases, the abnormality can involve more than one chromosome, or more than one portion of one or more chromosomes. The most common chromosome aneuploidy is trisomy, such as trisomy 21, where the genome of an afflicted patient has three chromosomes 21, as compared to two chromosomes 21. In rarer cases, the patient may have an extra piece of chromosome 21 (less than full length) in addition to the normal pair. In yet other cases, a portion of chromosome 21 may be translocated to another chromosome, such as chromosome 14. In this example, chromosome 21 is referred as the “chromosome relevant to the chromosomal aneuploidy” and a second, chromosome that is present in the normal pair in the patient's genome, for example chromosome 1, is a “reference chromosome.” There are also cases where the number of a relevant chromosome is less than the normal number of 2. Turner syndrome is one example of a chromosomal aneuploidy where the number of X chromosome in a female subject has been reduced from two to one.

Chromosomal Copy number variation: A microdeletion is a small deletion in a chromosome, such as a deletion of about 10 kilobases to about 5 million base pairs, for example a deletion of about 100 to about 5 million base pairs, such as about 1,000 kilobases, such as about 100 kilobases, 200 kilobases, 400 kilobases, 500 kilobases, 750 kilobases, or about 1 million base pairs. Similarly, a “microduplication” is a small duplication in a chromosome, such as a duplication of about 10 kilobases to about 5 million base pairs, for example a deletion of about 100 to about 5 million base pairs, such as about 1,000 kilobases, such as about 100 kilobases, 200 kilobases, 400 kilobases, 500 kilobases, 750 kilobases, or about 1 million base pairs. A chromosomal fetal copy number variation is either a microdeletion or a microduplication in genomic DNA of a fetus of a pregnant woman.

CpG-containing genomic sequence: A segment of DNA sequence at a defined location in the genome of an individual such as a human fetus or a pregnant woman. Typically, a “CpG-containing genomic sequence” is at least 15 nucleotides in length and contains at least one cytosine. In some embodiments, a CpG containing sequence can be at least 30, 50, 80, 100, 150, 200, 250, or 300 nucleotides in length and contains at least 2, 5, 10, 15, 20, 25, or 30 cytosines. For any specific “CpG-containing genomic sequence” at a given location, for example, within a region centering around a given genetic locus on chromosome 21 nucleotide sequence variations can exist from individual to individual and from allele to allele even for the same individual. Typically, such a region centering around a defined genetic locus (e.g., a CpG island) contains the locus as well as upstream and/or downstream sequences. Each of the upstream or downstream sequence (counting from the 5′ or 3′ boundary of the genetic locus, respectively) can be as long as 1 kb, in other cases may be as long as 5 kb, 2 kb, 750 bp, 500 bp, 200 bp, or 100 bp. A “CpG-containing genomic sequence” can encompass a coding or a non-coding, nucleic acid sequence, and thus can include a nucleotide sequence transcribed (or not transcribed) for protein production. Thus, a CpG containing genomic sequence can be a nucleotide sequence can be a protein-coding sequence, a non protein-coding sequence or a combination thereof.

Control DNA: Genomic DNA obtained from an individual that is used for comparative purposes, such as DNA from a healthy individual who does not have a chromosomal abnormality. In some embodiments, a control DNA sample can be obtained from plasma of a female carrying a healthy fetus who does not have a chromosomal abnormality, which can serve as a negative control. When certain chromosome anomalies are known, the control can also be established standards that are indicative of a specific disease or condition.

To screen for three different chromosomal aneuploidies in a maternal plasma of a pregnant female, a panel of control DNAs that have been isolated from plasma of mothers who are known to carry a fetus with, for example, chromosome 13, 18, or 21 trisomy, and a mother who is pregnant with a fetus who does not have a chromosomal abnormality can be used as a control.

Copy number: The number of copies of a section of DNA in a genome. Copy number analysis usually refers to the process of analyzing data produced by a test for DNA copy number variation in patient's sample. Such analysis helps detect chromosomal copy number variation that may cause or may increase risks of various critical disorders. Copy number variation can be detected with various types of tests, including, but not limited to, such as methylation status, fluorescent in situ hybridization, comparative genomic hybridization high-resolution array-based tests based on array comparative genomic and SNP array technologies. The methods disclosed herein can be used to determine the copy number of a specific locus of interest.

DNA (deoxyribonucleic acid): DNA is a long chain polymer which comprises the genetic material of most living organisms (some viruses have genes comprising ribonucleic acid (RNA)). The repeating units in DNA polymers are four different nucleotides, each of which comprises one of the four bases, adenine, guanine, cytosine and thymine bound to a deoxyribose sugar to which a phosphate group is attached. Triplets of nucleotides (referred to as codons) code for each amino acid in a polypeptide, or for a stop signal (termination codon). The term codon is also used for the corresponding (and complementary) sequences of three nucleotides in the mRNA into which the DNA sequence is transcribed.

Unless otherwise specified, any reference to a DNA molecule is intended to include the reverse complement of that DNA molecule. Except where single-strandedness is required by the text herein, DNA molecules, though written to depict only a single strand, encompass both strands of a double-stranded DNA molecule. Thus, a reference to the nucleic acid molecule that encodes a protein, or a fragment thereof, encompasses both the sense strand and its reverse complement. Thus, for instance, it is appropriate to generate probes or primers from the reverse complement sequence of the disclosed nucleic acid molecules.

DNA sequencing: The process of determining the nucleotide order of a given DNA molecule. The general characteristics of “parallel or massively parallel sequencing” are that the sequence of the target genetic material is then performed in parallel and the sequence information is captured by a computer. For example, the sequencing can be performed using sequencing-by-synthesis with reversible terminations (ILLUMINA® Genome Analyzer or Hi Seq, Next-Seq, Nova-Seq etc., semiconductor sequencing (Thermo FishernIon Torrent) or nanopore sequencing (Oxford Nanopore).

Differentially Modifies (methylated or non-methylated DNA): A reagent that modifies methylated or non-methylated DNA, respectively, in a process through which distinguishable products result from methylated and non-methylated DNA, thereby allowing the identification of the DNA methylation status. Such processes may include, but are not limited to, chemical reactions (such as conversion by bisulfite) and enzymatic treatment (such as cleavage by a methylation-dependent endonuclease), or an antibody that specifically binds a methylated (or non-methylated) DNA sequence. Thus, an enzyme that preferentially cleaves or digests methylated DNA is one capable of cleaving or digesting a DNA molecule at a significantly higher efficiency when the DNA is methylated, whereas an enzyme that preferentially cleaves or digests unmethylated DNA exhibits a significantly higher efficiency when the DNA is not methylated.

Gene: A segment of DNA that contains the coding sequence for a protein, wherein the segment may include promoters, exons, introns, and other untranslated regions that control expression.

Genotype: An unphased 5′ to 3′ sequence of nucleotide pair(s) found at a set of one or more polymorphic sites in a locus on a pair of homologous chromosomes in an individual. “Genotyping” is a process for determining a genotype of an individual.

Genomic segment: A contiguous sequence of genomic DNA no more than 2000 bases in length.

Hybridization: Oligonucleotides and their analogs hybridize by hydrogen bonding, which includes Watson-Crick, Hoogsteen or reversed Hoogsteen hydrogen bonding, between complementary bases. Generally, nucleic acids consist of nitrogenous bases that are either pyrimidines (cytosine (C), uracil (U), and thymine (T)) or purines (adenine (A) and guanine (G)). These nitrogenous bases form hydrogen bonds between a pyrimidine and a purine, and the bonding of the pyrimidine to the purine is referred to as “base pairing.” More specifically, A will hydrogen bond to T or U, and G will bond to C. “Complementary” refers to the base pairing that occurs between two distinct nucleic acid sequences or two distinct regions of the same nucleic acid sequence. For example, an oligonucleotide can be complementary to a specific genetic locus, so it specifically hybridizes with a mutant allele (and not the reference allele) or so that it specifically hybridizes with a reference allele (and not the mutant allele).

“Specifically hybridizable” and “specifically complementary” are terms that indicate a sufficient degree of complementarity such that stable and specific binding occurs between the oligonucleotide (or its analog) and the DNA or RNA target, such that the target can be distinguished. The oligonucleotide or oligonucleotide analog need not be 100% complementary to its target sequence to be specifically hybridizable. An oligonucleotide or analog is specifically hybridizable when binding of the oligonucleotide or analog to the target DNA or RNA molecule interferes with the normal function of the target DNA or RNA, and there is a sufficient degree of complementarity to avoid non-specific binding of the oligonucleotide or analog to non-target sequences under conditions where specific binding is desired, for example under physiological conditions in the case of in vivo assays or systems. Such binding is referred to as specific hybridization. In one example, an oligonucleotide is specifically hybridizable to DNA or RNA nucleic acid sequences including an allele of a gene, wherein it will not hybridize to nucleic acid sequences containing a polymorphism.

Hybridization conditions resulting in particular degrees of stringency will vary depending upon the nature of the hybridization method of choice and the composition and length of the hybridizing nucleic acid sequences. Generally, the temperature of hybridization and the ionic strength (especially the Na⁺ concentration) of the hybridization buffer will determine the stringency of hybridization, though wash times Also influence stringency. Calculations regarding hybridization conditions required for attaining particular degrees of stringency are discussed by Sambrook et al. (ed.), Molecular Cloning: A Laboratory Manual, 2nd ed., vol. 1-3, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y., 1989, chapters 9 and 11.

Increase or a Decrease: A significantly significant positive or negative change, respectively, in quantity from a control value. An increase is a positive change, such as a 50%, 100%, 200%, 300%, 400% or 500% increase as compared to the control value. A decrease is a negative change, such as a 50%, 100%, 200%, 300%, 400% or 500% decrease as compared to a control value.

Isolated: An “isolated” biological component (such as a nucleic acid molecule, protein or organelle) has been substantially separated or purified away from other biological components in the cell of the organism in which the component naturally occurs, i.e., other chromosomal and extra-chromosomal DNA and RNA, proteins and organelles. Nucleic acids and proteins that have been “isolated” include nucleic acids and proteins purified by standard purification methods. The term also embraces nucleic acids and proteins prepared by recombinant expression in a host cell as well as chemically synthesized nucleic acids.

Locus: A location on a chromosome or DNA molecule corresponding to a gene or a physical or phenotypic feature, where physical features include polymorphic sites.

Methylation: The addition of a methyl group (—CH₃) to cytosine nucleotides of CpG sites in DNA. DNA methylation, the addition of a methyl group onto a nucleotide, is a post-replicative covalent modification of DNA that is catalyzed by a DNA methyltransferase enzyme. In biological systems, DNA methylation can serve as a mechanism for changing the structure of DNA without altering its coding function or its sequence.

Methylation sequencing assay: A sequencing assay that detects the methylation status of one or more CpG sites in DNA. A non-limiting example of a methylation sequencing assay is a sequencing assay performed on bisulfite-treated and amplified genomic DNA.

Methylation status: The state of methylation of a genomic sequence. This refers to the characteristics of a DNA segment at a particular genomic locus relevant to methylation. Such characteristics include, but are not limited to, whether any of the cytosine (C) residues within this DNA sequence are methylated, location of methylated C residue(s), percentage of methylated C at any particular stretch of residues, and allelic differences in methylation. The methylation profile affects the relative or absolute concentration of methylated C or unmethylated C at any particular stretch of residues in a biological sample.

Methyl-sensitive enzymes: DNA restriction endonucleases that are dependent on the methylation state of their DNA recognition site for activity. For example, there are methyl-sensitive enzymes that cleave at their DNA recognition sequence only if it is not methylated. Thus, an unmethylated DNA sample will be cut into smaller fragments than a methylated DNA sample. Similarly, a hypermethylated DNA sample will not be cleaved. In contrast, there are methyl-sensitive enzymes that cleave at their DNA recognition sequence only if it is methylated. As used herein, the terms “cleave”, “cut” and “digest” are used interchangeably.

Methyl-sensitive enzymes that digest unmethylated DNA suitable for use in methods of the invention include, but are not limited to, HpaII, HhaI, MaeII, BstUI and AciI. One enzyme is HpaII that cuts only the unmethylated sequence CCGG. Enzymes that digest only methylated DNA include, but are not limited to, DpnI, which cuts at a recognition sequence GATC, and McrBC, which belongs to the family of AAA⁺ proteins (New England BioLabs, Inc., Beverly, Mass.).

Cleavage methods and procedures for selected restriction enzymes for cutting DNA at specific sites are well known to the skilled artisan. For example, many suppliers of restriction enzymes provide information on conditions and types of DNA sequences cut by specific restriction enzymes, including New England BioLabs, Promega Corporation, Boehringer-Mannheim, and the like. Sambrook et al. (See Sambrook et al., Molecular Biology: A Laboratory Approach, Cold Spring Harbor, N.Y. 1989) provide a general description of methods for using restriction enzymes and other enzymes.

Oligonucleotide: An oligonucleotide is a plurality of joined nucleotides joined by native phosphodiester bonds, between about 6 and about 300 nucleotides in length. An oligonucleotide analog refers to moieties that function similarly to oligonucleotides but have non-naturally occurring portions. For example, oligonucleotide analogs can contain non-naturally occurring portions, such as altered sugar moieties or inter-sugar linkages, such as a phosphorothioate oligodeoxynucleotide. Functional analogs of naturally occurring polynucleotides can bind to RNA or DNA, and include peptide nucleic acid (PNA) molecules.

In several examples, oligonucleotides and oligonucleotide analogs can include linear sequences up to about 200 nucleotides in length, for example a sequence (such as DNA or RNA) that is at least 6 bases, for example at least 8, 10, 15, 20, 25, 30, 35, 40, 45, 50, 100 or even 200 bases long, or from about 6 to about 70 bases, for example about 10-25 bases, such as 12, 15 or 20 bases.

Polymorphic marker: A segment of genomic DNA that exhibits heritable variation in a DNA sequence between individuals. Such markers include, but are not limited to, single nucleotide polymorphisms (SNPs), restriction fragment length polymorphisms (RFLPs), short tandem repeats, such as di-, tri- or tetra-nucleotide repeats (STRs), and the like. Polymorphic markers can be used to specifically differentiate between a maternal and paternal allele in the enriched fetal nucleic acid sample.

Polymorphism: A variation in a gene sequence. The polymorphisms can be those variations (DNA sequence differences) which are generally found between individuals or different ethnic groups and geographic locations which, while having a different sequence, produce functionally equivalent gene products. Typically, the term can also refer to variants in the sequence which can lead to gene products that are not functionally equivalent. Polymorphisms also encompass variations which can be classified as alleles and/or mutations which can produce gene products which may have an altered function. Polymorphisms also encompass variations which can be classified as alleles and/or mutations which either produce no gene product or an inactive gene product or an active gene product produced at an abnormal rate or in an inappropriate tissue or in response to an inappropriate stimulus. Alleles are the alternate forms that occur at the polymorphism.

Polymorphisms can be referred to, for instance, by the nucleotide position at which the variation exists, by the change in amino acid sequence caused by the nucleotide variation, or by a change in some other characteristic of the nucleic acid molecule or protein that is linked to the variation.

Primers: Primers are nucleic acid molecules, usually DNA oligonucleotides of about 10-50 nucleotides in length (longer lengths are also possible). Typically, primers are at least about 15 nucleotides in length, such as at least about 20, 25, 30, or 40 nucleotides in length. For example, a primer can be about 10-50 nucleotides in length, such as, 10-30, 15-20, 15-25, 15-30, or 20-30 nucleotides in length. Primers can also be of a maximum length, for example no more than 25, 30, 40, or 50 nucleotides in length. Forward and reverse primers may be annealed to a complementary target DNA strand by nucleic acid hybridization to form hybrids between the primers and the target DNA strand, and then extended along the target DNA strand by a DNA polymerase enzyme to form an amplicon. One of skill in the art will appreciate that the hybridization specificity of a particular probe or primer typically increases with its length. Thus, for example, a probe or primer including 20 consecutive nucleotides typically will anneal to a target with a higher specificity than a corresponding probe or primer of only 15 nucleotides. In some embodiments, forward and reverse primers are used in combination in a bisulfite amplicon sequencing assay.

Sample: A sample, such as a biological sample, is a sample obtained from a subject. As used herein, biological samples include all clinical samples useful for detection of fetal aneuploidy, including, but not limited to, cells, tissues, and bodily fluids, such as: blood; derivatives and fractions of blood, such as serum; urine; sputum; or CVS samples. In a particular example, a sample includes blood obtained from a human subject, such as whole blood or serum. Microdeletions and microduplications can be measured in samples isolated from a subject.

Sensitivity and specificity: Statistical measurements of the performance of a binary classification test. Sensitivity measures the proportion of actual positives which are correctly. Specificity measures the proportion of negatives which are correctly identified.

Sequence Read: A sequence (e.g., of about 300 bp) of contiguous base pairs of a nucleic acid molecule. The sequence read may be represented symbolically by the base pair sequence (in ATCG) of the sample portion. It may be stored in a memory device and processed as appropriate to determine whether it matches a reference sequence or meets other criteria. A sequence read may be obtained directly from a sequencing apparatus or indirectly from stored sequence information concerning a sample.

Standard control: A value reflective of the ratio, or the amount or concentration of a fetal genomic sequence located on a chromosome relevant to a particular chromosomal aneuploidy (such as trisomy 13, 18, or 21) over the amount or concentration of a fetal genetic marker located on a reference chromosome, as the amounts or concentrations are found in a biological sample (for example, blood, plasma, or serum) from an average, healthy pregnant woman carrying a chromosomally normal fetus. A “standard control” can be determined differently and represent different value depending on the context in which it is used. For instance, when used in an epigenetic-genetic dosage method where an epigenetic marker is measured against a genetic marker, the “standard control” is a value reflective of the ratio, or the amount or concentration of a fetal genomic sequence located on a chromosome relevant to a particular chromosomal aneuploidy (for example, trisomy 13, 18, or 21) over the amount or concentration of a fetal genetic marker located on a reference chromosome, as the amounts or concentrations are found in a biological sample (such as blood, plasma, or serum) from an average, healthy pregnant woman carrying a chromosomally normal fetus. In some embodiments, a standard control is determined based on an average healthy pregnant woman at a certain gestational age.

Subject: Living multi-cellular vertebrate organisms, a category that includes human and non-human mammals (such as laboratory or veterinary subjects).

Target sequence or region: A sequence of nucleotides located in a particular region in the human genome. The target can be for instance a coding sequence; it can also be the non-coding strand that corresponds to a coding sequence. The target can also be a non-coding sequence, such as an intronic sequence.

Unless otherwise explained, all technical and scientific terms used in this Example have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs. The singular terms “a,” “an,” and “the” include plural referents unless context clearly indicates otherwise. Similarly, the word “or” is intended to include “and” unless the context clearly indicates otherwise. It is further to be understood that all base sizes or amino acid sizes, and all molecular weight or molecular mass values, given for nucleic acids or polypeptides are approximate, and are provided for description. Although methods and materials similar or equivalent to those described herein can be used in the practice or testing of this disclosure, suitable methods and materials are described below. The term “comprises” means “includes.”

This Example provides an improved method for the detection of fetal aneuploidy and/or the detection of sub-chromosomal fetal copy number variations via bisulfate sequencing of maternal plasma. One intermediate step of the method is to estimate the percentage of DNA fragments in the maternal plasma that originate from the fetus. In some embodiments, the method for the detection of fetal aneuploidy includes the following steps.

First, the method comprises estimating the percentage of fetal DNA fragments in the sample using exclusively the reads aligned to the target chromosome for which we would like to test fetal aneuploidy, e.g., chromosome 21. Herein, this estimation is called F_(A).

Next, the method comprises estimating the percentage of fetal DNA fragment in the sample using the reads aligned to the reference chromosomes that we believe are not affected by fetal aneuploidy, e.g., chromosome 1, 2, 3, etc. Herein, this estimation is called F_(D).

The next step comprises calculating the difference between F_(A)−F_(D).

If |F_(A)−F_(D)|>c₁, for some threshold c₁>0, we claim that the fetus has aneuploidy in the target chromosome. Specifically, if F_(A)−F_(D)>c₁, it is indicated that the fetus has one more extra copy in the target chromosome. If F_(A)−F_(D)<−c₁, it is indicated that the fetus has only one copy in the target chromosome.

If |F_(A)−F_(D)|>c₂, for some threshold c₂ such that c₁>c₂>0, and F_(D)<c₃ for threshold c₃>0, it is undecided whether the fetus has aneuploidy in the target chromosome.

If |F_(A)−F_(D)|<c₂, it is undecided whether the fetus is not affected by aneuploidy in the target chromosome.

In some embodiments, the method for the detection of sub-chromosomal fetal copy number variations includes the following steps:

First, the method comprises estimating the percentage of fetal DNA fragments in the sample using exclusively the reads aligned to a target genomic copy number variation region. The target copy number variation region may be on a target chromosome, or any chromosome. Herein, this estimation is called F_(AM).

Next, the method comprises estimating the percentage of fetal DNA fragment in the sample using the reads aligned to a reference or control copy number variation region that is known to not be significantly affected by fetal copy number variations. The reference copy number variation region may be on a reference chromosome or any chromosome. Herein, this estimation is called F_(DM).

The next step comprises calculating the difference between F_(AM)−F_(DM).

If |F_(AM)−F_(DM)|>c₁, for some threshold c₁>0, we claim that the sample has the presence of sub-chromosomal fetal microdeletions. Specifically, if F_(AM)−F_(DM)>c₁, it is indicated that the fetal DNA in the sample has sub-chromosomal copy number variations. If F_(AM)−F_(DM)<−c₁, it is indicated that the fetal DNA in the sample does not have sub-chromosomal copy number variations.

If |F_(AM)−F_(DM)|>c₂, for some threshold c₂ such that c₁>c₂>0, and F_(D)<c₃ for constant c₃>0, it is undecided whether the fetus has sub-chromosomal copy number variations.

If |F_(A)−F_(D)|<c₂, it is undecided whether the fetus is not affected by sub-chromosomal copy number variations.

Example Embodiment F of Example 3: Detection of Trisomy 21 by Bisulfite Sequencing of Maternal Plasma

Sequencing Datasets Used were as Follows:

CM dataset: CVS and MBC genome-wide targeted bisulfite sequencing data: We have 5 bisulfite sequencing libraries from 5 normal CVS samples and 6 bisulfite sequencing libraries from 6 normal MBC samples. Genome-wide targeting was achieved via solution-phase hybridization using Agilent kit. There are about 2.8 million CpG sites (ranging from 1.2 to 3.5 million) covered by 5 or more reads in each library. In total we have 2.75 million CpG sites that are covered by 5 or more reads in at least 3 libraries in each group.

PnP dataset: Normal Pregnant and Non pregnant plasma targeted bisulfite sequencing data: We have 6 bisulfite sequencing libraries from 6 normal maternal plasma samples and 9 bisulfite sequencing libraries from 9 normal nonpregnant woman plasma samples. Genome-wide targeting was achieved via solution-phase hybridization using the Roche kit. There are about 7.3 million CpG sites (ranging from 5.3 to 8.8 million) covered by 5 or more reads in each library. We have in total 6.84 million CpG sites that are covered by 5 or more reads in at least 4 libraries in each group.

T21N dataset: Maternal plasma (Normal and Trisomy 21) targeted bisulfite sequencing data included 14 libraries from 12 normal samples, and 12 libraries from 12 T21 samples. Genome-wide targeting was achieved via solution-phase hybridization using the Roche kit. In each library, there are about 7.7 million CpG sites covered by 5 or more reads (ranging from 4.86 million to 9.95 million). We have in total about 6.86 million CpG sites that are covered by 5 or more reads in at least 8 libraries in each group.

Model selection: The method first comprises determining the most appropriate predictive model for analyzing the methylation levels of libraries in the T21N data set.

Maternal plasma DNA can be considered as a mixture of the DNA from maternal tissues and fetal tissues. If we assume that among the fetal tissues, placenta contributes the most to the maternal plasma DNA, and among the maternal tissues, maternal white blood cells contribute the most to the maternal plasma DNA, we can approximate the fetal tissue signature using placental reference data, and the maternal tissue signature using maternal white blood cell reference data. This suggests that, for the first predictive model, we can predict the maternal plasma DNA methylation signature as a mixture of the average placental (CVS) DNA methylation signature and the average maternal white blood cells (MBC) DNA methylation signature. Herein this is referred to as the CM model.

Secondly, we can use the DNA from non-pregnant women's plasma to approximate the maternal plasma DNA coming from the maternal tissues. Therefore, as the second predictive model, we can predict the maternal plasma DNA methylation as a mixture of the average placenta (CVS) DNA methylation and the average non-pregnant women plasma (nP) DNA methylation. Herein this is referred to as the CnP model.

Finally, as discussed in Example 1, we can use mixtures of the maternal and fetal tissues to predict the maternal plasma DNA methylation, as long as the signals in these mixtures have different mix proportions. Therefore, as the third predictive model, we can predict the maternal plasma DNA methylation as a mixture of a certain average maternal plasma (P) DNA methylation and the average non-pregnant women plasma (nP) DNA methylation. (Note that because different maternal plasma samples have different proportions of fetal DNA, average maternal plasma methylation will be different for different sets of samples). Herein this is referred to as the PnP model.

Selection of CpG sites: CpG sites were selected in the target chromosome (chr21) and reference chromosomes (chr1, . . . , 12, 14, 15). We experimented with several different ways of identifying informative CpG sites for the detection of aneuploidy. In addition to the use of all CpG sites, we also considered the set of CpG sites that are differentially methylated between the maternal tissues (those that have the most significant influence on the maternal plasma DNA methylation pattern) and the fetal tissues (those that have the most significant influence on the maternal plasma methylation pattern).

For the CM model, the CpG sites must be shared by the CM data and the T21N data. In total we found 2031580 shared CpG sites. Among them, 20859 are located in the target chromosome chr21, 1431135 are located in the reference chromosomes (chr1, . . . , 12, 14, 15). Furthermore, if we require the CpG sites to be differentially methylated between the CVS and MBC samples, with p value <=0.05 and difference in methylation level >=20, the number of CpG sites reduces to 648617, with 7409 in the target chromosome and 455660 in the reference chromosomes.

For the CnP model, the CpG sites must be shared by all three data sets: CM data, PnP data, and T21N data. In total we found 1970505 shared CpG sites. Among them, 20015 are located in the target chromosome chr21, 1391135 are located in the reference chromosomes (chr1, . . . , 12, 14, 15). Furthermore, if we require the CpG sites to be differentially methylated between the CVS and MBC samples, with p value <=0.05 and difference in methylation level >=20, the number of CpG sites reduces to 637599, with 7243 in the target chromosome and 448842 in the reference chromosomes.

For the PnP model, the CpG sites must be shared by two data sets: PnP data, and T21N data. In total we found 6386312 shared CpG sites. Among them, 69344 are located in the target chromosome chr21, 4498308 are located in the reference chromosomes (chr1, . . . , 12, 14, 15). Furthermore, if we require the CpG sites to be differentially methylated between the pregnant and non-pregnant plasma samples, with p value <=0.05 and difference in methylation level between 2 and 20, the number of CpG sites reduces to 546671, with 6316 in the target chromosome and 392185 in the reference chromosomes.

Table 10 below summarizes the CpG sites used in our study:

TABLE 10 CpG sites used in Example Embodiment F. Prediction CpG sites Total Target present Selection CpG Chr Reference Model in Criteria sites sites Chr sites CM CM, T21N None 2031580 20859 1431135 CM CM, T21N P <= .05, 648617 7409 455660 DM >= 20 CnP CM, PnP, None 1970505 20015 1391135 T21N CnP CM, PnP, P <= .05, 637599 7243 448842 T21N DM >= 20 (CM) PnP PnP, T21N None 6386312 69344 4498308 PnP PnP, T21N P <= .05, 546671 6316 392185 2 <= DM <= 20

Application of the modified algorithm: The modified algorithm of this Example was applied to test to test which libraries in T21N are from trisomy 21 pregnancies.

We predicted the fetal frequency using the CpG sites shown in Table 10. For the CM model, we used the average methylation level of the selected CpG sites respectively in the 5 CVS libraries and 6 MBC libraries from the CM data set. For the CnP model, we used the average methylation level of the selected CpG sites respectively in the 5 CVS libraries from the CM data set and 6 maternal plasma libraries from the PnP data set. For the PnP model, we used the average methylation level of the selected CpG sites respectively in the 6 maternal plasma libraries and 9 nonpregnant plasma libraries from the PnP data set. The results of these analyses, shown at FIGS. 33-38, correspond to the 6 rows of Table 10, respectively.

Results: In each figure for the CM models (FIGS. 33 and 34) or CnP models (FIGS. 33 and 34), the Y axis in both panels shows the difference between the fetal frequency predicted by the CpG sites in the target chromosome (chr21) and the fetal frequency predicted by the CpG sites in the reference chromosomes. The red dots represent the normal maternal plasma samples, the blue dots the trisomy 21 (T21) maternal plasma samples. As expected, the difference is close to 0 for the normal samples, but much higher in the T21 samples. The X axis of the right panel is the estimated fetal frequency using the CpG sites in the reference chromosome. It is clear that, for the CM and CnP models, the difference between the chr21 predicted frequency and the reference chromosomes predicted fetal frequency for the T21 samples is positively correlated with the estimated fetal frequency, with a slope of about 0.5. Note that the estimation of fetal frequency probably is higher than expected in the CM model and will need some calibration. This is most likely because CVS and MBC represent the majority, though not all, types of tissues that contribute to the maternal plasma DNA.

For the PnP models (FIGS. 37 and 38), the X axis in the right panel of the figures cannot be interpreted as fetal frequency. It can be, however, interpreted as the ratio of the fetal frequency in each library to the average fetal frequency in the 6 maternal plasma samples from the PnP data. The Y axis in the two panels, therefore, can be interpreted as the ratio of the difference between the chr21 predicted frequency and the reference chromosomes predicted fetal frequency to the average fetal frequency of the 6 maternal plasma samples from the PnP data.

Example Embodiments for Example 3

C1. A method, comprising:

obtaining sequence reads of a methylation sequencing assay covering genomic segments of a maternal plasma sample from a pregnant subject;

estimating a first percentage of fetal DNA molecules (F_(A)) in the maternal plasma sample based on sequence reads that are aligned to a target chromosome;

estimating a second percentage of fetal DNA molecules (F_(D)) in the maternal plasma sample based on sequence reads that are aligned to one or more reference chromosomes; and

indicating the presence of a fetal aneuploidy in the pregnant subject responsive to an absolute value difference between the first percentage and the second percentage being larger than a threshold c₁, wherein c₁>0,

wherein the maternal plasma sample comprises maternal DNA and fetal DNA.

C2. The method of embodiment C1, wherein the target chromosome is chromosome 21, chromosome 13, chromosome 18, or chromosome X. C3. The method of embodiments C1 or C2, wherein the one or more reference chromosomes include one or more of chromosomes 1-12, 14-17, 19, 20 and 22. C4. The method of any of embodiments C1-C3, wherein indicating fetal aneuploidy comprises indicating that a fetus of the pregnant subject has an abnormal number of copies of the target chromosome. C5. The method of any of embodiments C1-C4, wherein the pregnant subject is a human. C6. A method for detecting fetal aneuploidy in a maternal plasma DNA sample obtained from a pregnant woman as described herein. C7. A method for detecting sub-chromosomal fetal copy number variation in a maternal plasma DNA sample obtained from a pregnant woman as described herein. C8. A method, comprising:

obtaining sequence reads of a methylation sequencing assay covering genomic segments of a maternal plasma sample from a pregnant subject;

estimating a first percentage of fetal DNA molecules (F_(AM)) in the maternal plasma sample based on sequence reads that are aligned to a target copy number variation region of a chromosome of a genome;

estimating a second percentage of fetal DNA molecules (F_(DM)) in the maternal plasma sample based on sequence reads that are aligned to one or more reference regions of the genome; and

indicating the presence of a sub-chromosomal fetal copy number variation in the pregnant subject responsive to an absolute value difference between the first percentage and the second percentage being larger than a threshold c₁, wherein c₁>0, wherein the maternal plasma sample comprises maternal DNA and fetal DNA.

Abstract of this Example (Example 3)

Non-invasive methods are disclosed in this Example for prenatal diagnosis. These methods can be used to detect fetal aneuploidy and/or sub-chromosomal fetal copy number variations. The methods are performed on a biological sample including maternal plasma DNA, such as a plasma or blood sample from a pregnant woman.

Example 4—Methods and Materials for Assessing and Treating Endometriosis Field of this Example

This Example relates to methods and materials involved in assessing a mammal (e.g., a human) for and/or treating a mammal (e.g., human) having or developing endometriosis. For example, this Example provides methods and materials for using DNA methylation profiles of nucleic acid within a liquid biopsy (e.g., a plasma sample, a urine sample, a tampon blood sample, or a cervical or vaginal swab sample) to determine whether or not a mammal has, or is developing, endometriosis. This Example also provides methods, algorithms, and kits for diagnosing, prognosing, monitoring, classifying, and/or treating endometriosis.

Background of this Example

Endometriosis is a debilitating disease involving the growth of uterine tissue outside the uterus. The primary symptoms are pelvic pain and infertility. Nearly half of affected women have chronic pelvic pain, and in 70 percent of those, the pain occurs during menstruation. Pain with sex also is common. Infertility occurs in up to half of women affected. Less common symptoms include urinary or bowel symptoms. About 25 percent of women have no symptoms. Endometriosis can have both social and psychological effects.

Currently, definitive diagnosis is achieved by surgical biopsy, which is achieved laparoscopically. Because of the invasive nature of this method, diagnosis and therefore treatment, are frequently considerably delayed, and the consequence of this is prolonged and progressive pain and a risk of infertility.

Summary of this Example

This Example provides methods and materials involved assessing a mammal (e.g., a human) for and/or treating a mammal (e.g., human) having or developing endometriosis. For example, this Example provides methods and materials for using DNA methylation profiles of nucleic acid within a liquid biopsy (e.g., a plasma sample, a urine sample, a tampon blood sample, or a cervical or vaginal swab sample) to determine whether or not a mammal has, or is developing, endometriosis. Determining if a mammal (e.g., a human) has, or is likely to develop, endometriosis by assessing DNA methylation profiles of nucleic acid within a liquid biopsy (e.g., a plasma sample, a urine sample, a tampon blood sample, or a cervical swab sample) can aid in the identification of mammals (e.g., humans) that should be treated in a particular manner (e.g., by administering a hormone therapy, by administering a pain medication, and/or by performing a surgical treatment), for example, early in the disease process.

This Example also provides methods for diagnosing, prognosing, monitoring, classifying, and/or treating endometriosis. For example, the methods described in this Example can include determining the methylation status of one or more genomic loci in a sample (e.g., a plasma sample, a urine sample, a tampon blood sample, or a cervical swab sample) of a mammal (e.g., a female human). This Example further provides algorithms and kits for diagnosing, prognosing, monitoring, classifying, and/or treating endometriosis.

In one aspect, this Example provides a method for diagnosing, prognosing, classifying, and/or monitoring endometriosis in a mammal (e.g., a human female) comprising: (a) obtaining a sample (e.g., a plasma sample, a urine sample, a tampon blood sample, or a cervical or vaginal swab sample) from the mammal; (b) determining the methylation status and/or level of one or more genomic loci in the sample; (c) comparing the methylation status and/or level of the one or more genomic loci to a reference; and (d) identifying the mammal as having endometriosis, wherein the difference in the methylation status and/or level of the one or more genomic loci in the sample compared to the reference indicates the presence of endometriosis in the mammal.

In another aspect, this Example provides a method of treating endometriosis in a mammal (e.g., a human female) comprising: (a) obtaining a sample (e.g., a plasma sample, a urine sample, a tampon blood sample, or a cervical or vaginal swab sample) from the mammal; (b) determining the methylation status and/or level of one or more genomic loci present in the sample; (c) comparing the methylation status and/or level of the one or more genomic loci to a reference or a set of predicted; (d) identifying the mammal as having endometriosis, wherein the difference in the methylation status and/or level of the one or more genomic loci in the sample compared to the reference or a set of predicted values indicates the presence of endometriosis in the mammal; and (e) treating the mammal by administering a hormone therapy, by administering a pain medication, and/or by performing a surgical treatment.

In some cases, an increase in the level of methylation of the one or more genomic loci in a sample (e.g., a plasma sample, a urine sample, a tampon blood sample, or a cervical or vaginal swab sample) can indicate the presence of endometriosis in the mammal or a decrease in the level of methylation of the one or more genomic loci in a sample (e.g., a plasma sample, a urine sample, a tampon blood sample, or a cervical swab sample) indicates the presence of the endometriosis in the mammal. In some cases, a decrease in the level of methylation of at least one of the one or more genomic loci in a sample and an increase in the level of methylation of at least one of the one or more genomic loci in the sample can indicate the presence of endometriosis in the mammal.

In some cases, the reference can be the methylation status and/or level of the one or more genomic loci in a sample (e.g., a plasma sample, a urine sample, a tampon blood sample, or a cervical swab sample) obtained from a mammal (e.g., a human female) that does not have endometriosis.

In another aspect, this Example provides a method of treating endometriosis in a mammal (e.g., a human female) comprising: (a) measuring the methylation status and/or level of one or more genomic loci present in a sample (e.g., a plasma sample, a urine sample, a tampon blood sample, or a cervical or vaginal swab sample) from the mammal prior to a treatment of endometriosis; (b) measuring the methylation status and/or level of one or more genomic loci present in a sample (e.g., a plasma sample, a urine sample, a tampon blood sample, or a cervical or vaginal swab sample) from the mammal during the treatment of endometriosis; and (c) continuing the treatment if the difference in the methylation status and/or level of the one or more genomic loci between the samples from prior to and during the treatment of endometriosis indicates the subject is responsive to the treatment. In some cases, the method further comprises (d) administering a different treatment to the mammal if the difference in the methylation status and/or level of the one or more genomic loci between the samples from prior to and during the treatment of endometriosis indicates the subject is not responsive to the treatment.

In some cases, an increase in the level of methylation of the one or more genomic loci in the sample indicates that the mammal is responsive to the treatment, or a decrease in the level of methylation of the one or more genomic loci in the sample indicates the mammal is responsive to the treatment. In certain embodiments, a decrease in the level of methylation of at least one of the one or more genomic loci in the sample and an increase in the level of methylation of at least one of the one or more genomic loci in the sample indicates the mammal is responsive to the treatment. In certain embodiments, an increase in the level of methylation of the one or more genomic loci in the sample indicates the mammal is responsive to the treatment, or a decrease in the level of methylation of the one or more genomic loci in the sample indicates the mammal is not responsive to the treatment. In certain embodiments, a decrease in the level of methylation of at least one of the one or more genomic loci in the sample and an increase in the level of methylation of at least one of the one or more genomic loci in the sample indicates the subject is not responsive to the treatment.

This Example further provides algorithms for diagnosing and/or monitoring a mammal having endometriosis. In certain embodiments, the algorithm of this Example can be used to classify endometriosis of a mammal (e.g., a human female).

In another aspect, this Example provides a kit for diagnosing, prognosing, and/or monitoring endometriosis in a mammal comprising a means for determining and/or detecting the methylation status of one or more genomic loci. In certain embodiments, the means comprises one or more primers and/or probes for determining and/or detecting the methylation status of the one or more genomic loci.

In certain embodiments, the one or more genomic loci are present within nucleic acids isolated from the sample. In certain embodiments, the one or more genomic loci are present within cell-free nucleic acids isolated from the sample.

Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention pertains. Although methods and materials similar or equivalent to those described herein can be used in the practice or testing of the present invention, suitable methods and materials are described below. All publications, patent applications, patents, and other references mentioned herein are incorporated by reference in their entirety. In case of conflict, the present specification, including definitions, will control. In addition, the materials, methods, and examples are illustrative only and not intended to be limiting.

Other features and advantages of the invention will be apparent from the following detailed description, and from the claims.

Description of this Example

This Example provides methods for diagnosing, prognosing, monitoring, classifying, and/or treating endometriosis. For example, the methods described herein can include determining the methylation status of one or more genomic loci in a sample of a mammal (e.g., a human female). In some cases, the methods described herein can include the use of an algorithm to diagnose, prognose, monitor, classify, and/or assist in the treatment of endometriosis.

Unless defined otherwise, all technical and scientific terms used in this Example generally have their ordinary meanings in the art, within the context of this Example and in the specific context where each term is used. The following references provide one of skill with a general definition of many of the terms used in this Example: Singleton et al., Dictionary of Microbiology and Molecular Biology (2nd ed. 1994); The Cambridge Dictionary of Science and Technology (Walker ed., 1988); The Glossary of Genetics, 5th Ed., R. Rieger et al. (eds.), Springer Verlag (1991); and Hale & Marham, The Harper Collins Dictionary of Biology (1991). Certain terms are discussed below, or elsewhere in the specification, to provide additional guidance to the practitioner in describing the compositions and methods of this Example and how to make and use them.

The terms “comprise(s),” “include(s),” “having,” “has,” “can,” “contain(s),” and variants thereof, as used herein, are intended to be open-ended transitional phrases, terms or words that do not preclude the possibility of additional acts or structures. The singular forms “a,” “an,” and “the” include plural references unless the context clearly dictates otherwise. The present Example also contemplates other embodiments “comprising,” “consisting of” and “consisting essentially of,” the embodiments or elements presented herein, whether explicitly set forth or not.

As used herein, the use of the word “a” or “an” when used in conjunction with the term “comprising” in the claims and/or the specification may mean “one,” but it is also consistent with the meaning of “one or more,” “at least one,” and “one or more than one.” Still further, the terms “having,” “including,” “containing,” and “comprising” are interchangeable, and one of skill in the art is cognizant that these terms are open ended terms.

The term “about” or “approximately” means within an acceptable error range for the particular value as determined by one of ordinary skill in the art, which will depend in part on how the value is measured or determined, i.e., the limitations of the measurement system. For example, “about” can mean within 3 or more than 3 standard deviations, per the practice in the art. Alternatively, “about” can mean a range of up to 20%, preferably up to 10%, more preferably up to 5%, and more preferably still up to 1% of a given value. Alternatively, particularly with respect to biological systems or processes, the term can mean within an order of magnitude, preferably within 5-fold, and more preferably within 2-fold, of a value.

In certain embodiments, the term “biomarker” refers to a marker (e.g., DNA methylation status) that allows detection of a disease and/or disorder in an individual, including detection of the disease or the disorder in its early stages. In certain embodiments, the term “biomarker” refers to a marker (e.g., DNA methylation status) that allows the characterization of a phenotype of a disease and/or a disorder in an individual. Early stage of a disease, as used herein, refers to the time period between the onset of the disease and the time point that signs or symptoms of the disease emerge. In certain non-limiting embodiments, the presence, absence, and/or level of a biomarker in a sample of a mammal (e.g., a human) is determined by comparing to a reference control.

The terms “reference sample,” “reference control,” “control,” or “reference,” as used interchangeably herein, refers to a control for a methylation status of a genomic locus that is to be detected in a sample of a mammal. In certain embodiments, a reference sample can be a sample from a healthy individual, e.g., an individual that does not have endometriosis. In certain embodiments, a reference sample can be a sample from a control individual that does not have the disease or phenotype to be detected by a biomarker disclosed herein. In certain embodiments, a control or reference can be the presence, absence, and/or a particular level of a methylation state of a genomic locus in a healthy individual. In certain embodiments, a reference can be a predetermined presence, absence, and/or particular level of a methylation state of a genomic locus that indicates a subject does not have endometriosis. In certain embodiments, a reference can be the methylation status of a locus in an individual having a disease or a phenotype, e.g., an individual that has endometriosis, where the methylation status of the locus is known to be not associated with the disease or the phenotype.

The term “a set of predicted values” refers to the methylation status of certain genomic loci for a sample. The status of those loci is not directly measured from that sample. Rather, it is inferred from measurements of other loci for that sample and/or measurements of other samples. The inference of the predicted values is based on some mathematical/statistical models. The models usually assume that the sample for which the methylation status of those loci is to be predicted has a normal phenotype. This assumption may be either correct or wrong, but its correctness is not required for the inference of the predicted values.

The term “slightly invasive or non-invasive method” refers to a method that does not involve the removal of tissues by biopsy from the uterus or endometrium. In certain embodiments, slightly invasive or non-invasive methods, as described herein, include obtaining plasma, urine, or a cervical or vaginal swab from a subject.

The term “patient” or “subject,” as used interchangeably herein, refers to any warm-blooded animal, e.g., human or non-human. Non-limiting examples of non-human subjects include mammals, non-human primates, dogs, cats, mice, rats, guinea pigs, rabbits, fowl, pigs, horses, cows, goats, sheep, etc. In certain embodiments, the subject is human.

The term “nucleic acid,” “nucleic acid molecule,” or “polynucleotide” includes any compound and/or substance that comprises a polymer of nucleotides. Each nucleotide is composed of a base, specifically a purine- or pyrimidine base (i.e., cytosine (C), guanine (G), adenine (A), thymine (T), or uracil (U)), a sugar (i.e., deoxyribose or ribose), and a phosphate group. In certain embodiments, the nucleic acid molecule is described by the sequence of bases, whereby said bases represent the primary structure (linear structure) of a nucleic acid molecule. The sequence of bases is typically represented from 5′ to 3′. These terms encompass deoxyribonucleic acid (DNA) including, e.g., complementary DNA (cDNA) and genomic DNA, ribonucleic acid (RNA), in particular messenger RNA (mRNA), synthetic forms of DNA or RNA, and mixed polymers comprising two or more of these molecules. The herein described nucleic acid molecule can contain naturally occurring or non-naturally occurring nucleotides. Examples of non-naturally occurring nucleotides include modified nucleotide bases with derivatized sugars or phosphate backbone linkages or chemically modified residues.

The term “isolated” (e.g., isolated genomic DNA) refers to a biological component that has been substantially separated or purified away from other biological components in the cell of the organism in which the component naturally occurs, e.g., other chromosomal and extra-chromosomal DNA and RNA, proteins, and organelles. Nucleic acids, e.g., DNA, that have been “isolated” include nucleic acids purified by standard purification methods.

The term “genomic locus” or “genomic DNA locus,” as used herein, refers to any fixed position in a genome. For example, a genomic locus can refer to a genomic element, a chromosomal region, a gene, a region of a gene, e.g., an exon or intron, a regulatory region of a gene, e.g., a promoter or enhancer, a CpG site, a CpG island, or a CpG island shore. For example, a genomic locus can include one or more CpG sites, e.g., between about 1 to about 100 CpG sites. In certain embodiments, a genomic locus can be of any particular length, e.g., between about 1 to about 10,000 nucleotides in length.

As used interchangeably herein, “methylation state,” “methylation profile,” “methylation status,” and “methylation level” refer to the presence, absence, percentage, and/or quantity of methylation at a particular nucleotide, or nucleotides, within a DNA region, e.g., a genomic locus. The methylation status of a particular DNA sequence (e.g., a genomic locus) can indicate the methylation state of every nucleotide in the sequence, indicate the methylation state of any of the nucleotides (e.g., cytosines) in the sequence, can indicate the methylation state of a subset of the nucleotides (e.g., of cytosines), can indicate the percentage or fraction of methylated cytosines at any particular stretch of nucleotides within the sequence or can indicate the average rate of methylation of all the cytosines (or a subset of the cytosines) present in a nucleic acid.

As used herein, a “methylated nucleic acid molecule” refers to a nucleic acid molecule that contains one or more methylated nucleotides that is/are methylated.

As used herein, a “CpG site” or “methylation site” is a nucleotide within a nucleic acid that is susceptible to methylation either by natural occurring events in vivo or by an event instituted to chemically methylate the nucleotide in vitro.

A “CpG island,” as used herein, describes a segment of a nucleic acid, e.g., DNA sequence, that have a high frequency of CpG dinucleotide repeats. See, e.g., Illingworth and Bird, FEBS Letters, 2009; 583:1713-1720. For example, Yamada et al. (Genome Research, 2004; 14:247-266) have described a set of standards for determining a CpG island: it must be at least 400 nucleotides in length, has a GC content greater than 50%, and an OCF/ECF ratio greater than 0.6. Others (Takai et al., Proc. Natl. Acad. Sci. U.S.A., 2002; 99:3740-3745) have defined a CpG island less stringently as a sequence of at least 200 nucleotides in length, having a greater than 50% GC content and an OCF/ECF ratio greater than 0.6.

A “CpG island shore,” as used herein, refers to methylation hotspots that are present a short distance, e.g., less than 2 kb, from CpG islands.

The term “methylome,” as used herein, refers to the amount or pattern of methylation at different sites or regions within a population of cells. The methylome can correspond to all of the genome, a subset of the genome (e.g., repeat elements in the genome), or a portion of the subset (e.g., those areas found to be associated with endometriosis). A methylome from plasma can be referred to a “plasma fluid methylome,” or a “plasma fluid DNA methylome.” The plasma fluid methylome is an example of a cell-free methylome that includes cell-free DNA (cfDNA).

As used herein, the term “increase” refers to alter positively by at least about 2%, including, but not limited to, alter positively by about 5%, by about 10%, by about 15%, by about 20%, by about 25%, by about 30%, by about 35%, by about 40%, by about 45%, by about 50%, by about 55%, by about 60%, by about 65%, by about 70%, by about 75%, by about 80%, by about 85%, by about 90%, by about 95% or by about 100%.

As used herein, the terms “reduce,” “reduction,” or “decrease” refers to alter negatively by at least about 2%, including, but not limited to, alter negatively by about 5%, by about 10%, by about 15%, by about 20%, by about 25%, by about 30%, by about 35%, by about 40%, by about 45%, by about 50%, by about 55%, by about 60%, by about 65%, by about 70%, by about 75%, by about 80%, by about 85%, by about 90%, by about 95% or by about 100%.

As described herein, this Example provides methods for diagnosing, monitoring, classifying, and/or treating endometriosis by analyzing the methylation status of one or more genomic loci in a sample (e.g., a plasma sample, a urine sample, a tampon blood sample, or a cervical or vaginal swab sample) of a mammal (e.g., a human female). In certain embodiments, the methods can include using an algorithm described herein. In certain embodiments, the methods described herein can allow for the early diagnosis or screening of a subject with endometriosis, e.g., the subject does not have any symptoms, or only have early symptoms of endometriosis.

In certain embodiments, samples obtained for use in the methods described herein can include cfDNA, which carries DNA methylation information from the cell of origin. cfDNA can arise from cellular apoptosis and necrosis, and can be generated from active secretory processes, with the formation of extracellular vesicles. DNA methylation signatures are highly tissue-specific, and include in vivo information relating to the tissue source of cfDNA. In certain embodiments, the methods described herein can include analyzing cfDNA in a sample (e.g., a plasma sample, a urine sample, a tampon blood sample, or a cervical or vaginal swab sample), to identify genetic phenotypes that are drivers and/or consequences of endometriosis.

The sample from the subject can be collected using any appropriate technique. For example, a blood sample, a plasma sample, a urine sample, a tampon blood sample, or a cervical or vaginal swab sample can be collected using standard methods. In some cases, the sample can be collected from the subject before the subject has any symptom of endometriosis, i.e., a non-symptomatic subject. In certain embodiments, the sample can be collected from the non-symptomatic subject who is at high risk of endometriosis. In certain embodiments, the sample can be collected from the subject who has previously received or is currently receiving a treatment for endometriosis. In certain embodiments, two or more samples (e.g., two or more, three or more, four or more, five or more, six or more or seven or more samples) can be obtained before and during the subject is receiving a treatment for endometriosis (e.g., serially obtained samples).

Diagnostic, Prognostic, Classification, and Monitoring Methods of this Example

This Example provides diagnostic and prognostic methods for diseases and/or disorders that are characterized by differential methylation of genomic loci. For example, this Example provides methods for diagnosing, prognosing, classifying, and/or monitoring endometriosis in a subject that includes analyzing the methylation status of certain genomic loci.

In certain embodiments, the analyzed genomic loci can include one or more genomic loci that exhibit differential methylation in a sample from a subject that has endometriosis compared to a reference sample. For example, the methods described herein can include assessing the methylation status of one or more genomic loci, e.g., about 5 or more, about 10 or more, about 50 or more, about 100 or more, about 500 or more, about 1,000 or more, about 5,000 or more, about 10,000 or more, about 25,000 or more, about 50,000 or more or about 100,000 or more genomic loci in a sample of a subject. In certain embodiments, the one or more genomic loci can be one or more promoter regions of one or more genes, one or more exons of one or more genes, one or more introns of one or more genes, one or more CpG sites, one or more CpG islands, one or more CpG island shores, one or more enhancers of one or more genes, or a combination thereof. In certain embodiments, the genomic loci are present in intergenic regions. In certain embodiments, the genomic loci are present on a particular chromosome.

In certain embodiments, this Example provides methods for diagnosing, prognosing, and/or monitoring endometriosis in a subject by detecting the DNA methylation profiles associated with endometriosis. In certain embodiments, the methods described herein can include (a) obtaining a sample from the subject, (b) determining the methylation status of one or more genomic loci present in the sample, e.g., present within cfDNA in a plasma sample, a urine sample, a tampon blood sample, or a cervical or vaginal swab sample, (c) comparing the methylation status of the one or more genomic loci to a reference or a set of predicted values, and (d) diagnosing endometriosis in the subject. In certain embodiments, the difference in the methylation status of the one or more genomic loci in the sample compared to the reference or a set of predicted values indicates the presence of endometriosis in the subject. In certain embodiments, the difference in the methylation status also can indicate the severity of endometriosis.

In certain embodiments, the methods described herein for diagnosing, prognosing, and/or monitoring endometriosis in a subject can include (a) obtaining a sample from the subject, (b) determining the level of methylation of one or more genomic loci present in the sample, (c) comparing the level of methylation of the one or more genomic loci to a reference or a set of predicted values, and (d) diagnosing endometriosis in the subject. In certain embodiments, the difference in the level of methylation of the one or more genomic loci in the sample compared to the reference or a set of predicted values indicates the presence of endometriosis in the subject. In certain embodiments, the difference in the methylation level also can indicate the severity of endometriosis.

In certain embodiments, diagnosing endometriosis in the subject can include characterizing a phenotype of the endometriosis, wherein the difference in the methylation status of the one or more genomic loci in the sample compared to the reference or a set of predicted values indicates the phenotype of the endometriosis. In certain embodiment, the phenotype of the endometriosis can include the severity of the endometriosis, prognosis of the endometriosis, molecular expression profile of the endometriosis, responsiveness of the endometriosis to certain treatments, or any combinations thereof.

In certain embodiments, the methods described herein for determining if a subject is at risk of developing endometriosis in the subject can include (a) obtaining a sample from the subject, (b) determining the level of methylation of one or more genomic loci present in the sample, (c) comparing the level of methylation of the one or more genomic loci to a reference or a set of predicted values, and (d) determining that the subject is at risk of developing endometriosis, wherein the difference in the level of methylation of the one or more genomic loci in the sample compared to the reference or a set of predicted values indicates that the subject is at risk.

In certain embodiments, diagnosing, prognosing, and/or monitoring of a subject with endometriosis can be based on a higher or lower methylation level of the genomic locus in the sample of the subject relative to the methylation level in a reference sample, e.g., a sample from a subject that does not have endometriosis. In certain embodiments, a difference of greater than about 5%, greater than about 10%, greater than about 15%, greater than about 20%, greater than about 25%, greater than about 30%, greater than about 35%, greater than about 40%, greater than about 45%, greater than about 50%, greater than about 55%, greater than about 60%, greater than about 65%, greater than about 70%, greater than about 75%, greater than about 80%, greater than about 85%, greater than about 90% or greater than about 95% in the methylation (e.g., level, percentage and/or fraction) of the one or more genomic loci in a sample obtained from a subject compared to a control can be indicative that the subject has endometriosis or is at risk of developing endometriosis. In certain embodiments, the difference can be a decrease in methylation (e.g., level, percentage, and/or fraction) of the genomic loci in the sample of the subject. Alternatively, the difference can be an increase in methylation (e.g., level, percentage, and/or fraction) of the genomic loci in the sample of the subject. In certain embodiments, the difference can be a decrease in the methylation of a genomic locus and an increase in the methylation of a different genomic locus in the sample obtained from the subject. In certain embodiments, a decrease in the level of methylation of one or more genomic loci in the sample and the increase in the level of methylation of one or more different genomic loci in the sample can indicate the presence of endometriosis.

In certain embodiments, diagnosis of a subject with endometriosis can be based on the methylated or unmethylated state of a genomic locus, e.g., a CpG site. In certain embodiments, a genomic locus, e.g., a CpG site, in a sample from a subject diagnosed with endometriosis can be methylated and the genomic locus, e.g., the CpG site, in a reference sample can be unmethylated. In certain embodiments, a genomic locus in a sample from a subject diagnosed with endometriosis can be unmethylated and the genomic locus in a reference sample can be methylated.

Diagnostic, Prognostic, Classification, and Monitoring Methods Using an Algorithm of this Example

This Example also provides diagnostic and prognostic methods for diseases and/or disorders that are characterized by differential methylation of genomic loci by using an algorithm as described, for example, in Example Embodiment H. For example, this Example provides methods for diagnosing, prognosing, classifying, and/or monitoring endometriosis in a subject that includes analyzing the methylation status of certain genomic loci and/or genomic fractions.

Methods for Treating Endometriosis

This Example also provides methods for treating a subject having endometriosis. For example, a mammal (e.g., a human female) that was identified as having endometriosis as described herein (or identified as being at risk of developing endometriosis as described herein) can be administered one or more hormone therapies, one or more pain medications, or a combination thereof to treat endometriosis. Examples of hormone therapies that can be used as described herein include, without limitation, gonadotropin-releasing hormone therapies (e.g., elagolix), estrogen therapies, progestin therapies, estrogen and progestin combination therapies, progesterone therapies, progesterone and progestin combination therapies, danazol therapies, and gestrinone therapies. Examples of pain medications that can be used as described herein include, without limitation, nonsteroidal anti-inflammatory drugs. In some cases, a mammal (e.g., a human female) that was identified as having endometriosis as described herein (or identified as being at risk of developing endometriosis as described herein) can be treated using a surgical procedure to treat endometriosis. Examples of surgical procedures that can be used as described herein include, without limitation, laparoscopic surgeries to remove one or more endometriosis patches, laparotomy surgeries to remove one or more endometriosis patches, and surgeries to sever pelvic nerves (e.g., presacral neurectomy or laparoscopic uterine nerve ablation surgeries).

In some cases, the information provided by the methods described herein can be used by a clinician or physician in determining the most effective course of treatment (e.g., preventative or therapeutic) for the subject. A course of treatment refers to the measures taken for a patient after the prognosis or the assessment of increased risk for development of endometriosis is made. For example, when a subject is identified to have an increased risk of developing endometriosis, the physician can determine whether frequent monitoring of DNA methylation changes can be performed as a prophylactic measure. Also, when a subject is diagnosed with endometriosis (e.g., based on the presence of a DNA methylation pattern in a sample from a subject), it can be advantageous to follow such detection with a therapeutic treatment.

In some cases, this Example provides methods for assessing the efficacy of a therapeutic or prophylactic therapy for treating endometriosis in a subject, comprising determining the methylation status of one or more genomic loci present in a sample obtained from a subject prior to the therapy and determining methylation status of the one or more genomic loci present in a sample obtained from the subject at one or more time points during the therapeutic or prophylactic therapy, wherein the therapy is efficacious for treating endometriosis in a subject when there is a change in the presence and/or level of methylation of the one or more genomic loci in the second or subsequent samples, relative to the first sample. In certain embodiments, the first sample is obtained after therapeutic treatment has begun.

In certain embodiments, the methods for monitoring the response in a subject to prophylactic or therapeutic treatment of endometriosis can include measuring the methylation status and/or level of one or more genomic loci in a sample of a subject at a first time-point, administering a therapeutic agent, re-measuring the methylation status and/or level of the one or more genomic loci at a second time-point, comparing the results of the first and second measurements and optionally modifying the treatment regimen based on the comparison. In certain embodiments, the first time-point can be prior to an administration of the therapeutic agent, and the second time-point can be after said administration of the therapeutic agent. In certain embodiments, the first time-point can be prior to the administration of the therapeutic agent to the subject for the first time. In certain embodiments, the dose (defined as the quantity of therapeutic agent administered at any one administration) can be increased or decreased in response to the comparison. In certain embodiments, the dosing interval (defined as the time between successive administrations) can be increased or decreased in response to the comparison, including total discontinuation of treatment. In addition, the methods described herein can be used to determine the efficacy of the therapeutic treatment, wherein a change in the methylation status of certain genomic loci present in a sample of a subject can indicate that the therapeutic treatment regimen can be altered, reduced, and/or stopped.

Assays of this Example

This Example also provides assays and/or methods for determining the DNA methylation status and/or level of genomic loci that correlates with the presence, absence, and/or severity of endometriosis. In some cases, the assay method can include comparing the methylation status and/or level of genomic loci present in a sample from a subject that has endometriosis to the methylation status and/or level of genomic loci in a sample from a healthy subject to determine the methylation pattern, as described above, that correlates with the presence of endometriosis. In some cases, the assay methods can include comparing the methylation status and/or level of genomic loci in a sample from a subject that has endometriosis at an early stage to the methylation status and/or level of genomic loci in a sample from a subject that has endometriosis at a late stage to determine the methylation status and/or level that correlates with the different stages and/or severity of endometriosis.

DNA Isolation Techniques of this Example

In certain embodiments, the methods described herein can include isolating nucleic acid from a sample (e.g., a plasma sample, a urine sample, a tampon blood sample, or a cervical or vaginal swab sample) obtained from a subject. Any appropriate technique can be used to isolate nucleic acids from a sample. For example, isolation of DNA from a plasma sample can be performed by extraction methods using organic solvents such as a mixture of phenol and chloroform, followed by precipitation with ethanol (see, for example, J. Sambrook et al., “Molecular Cloning: A Laboratory Manual,” 1989, 2nd Ed., Cold Spring Harbor Laboratory Press: New York, N.Y.). Additional non-limiting examples include salting out DNA extraction (see, for example, P. Sunnucks et al., Genetics, 1996, 144:747-756; and S. M. Aljanabi and I. Martinez, Nucl. Acids Res. 1997, 25:4692-4693), the trimethylammonium bromide salts DNA extraction method (see, for example, S. Gustincich et al., BioTechniques, 1991, 11:298-302), and the guanidinium thiocyanate DNA extraction method (see, for example, J. B. W. Hammond et al., Biochemistry, 1996, 240:298-300). There are also numerous commercially available kits that can be used to extract DNA from biological fluids (e.g., plasma samples) or cells. For example, Qiagen's Gentra PureGene Cell Kit, QlAamp Circulating Nucleic Acid Kit, QiaAmp DNA Mini Kit, DNeasy Blood and Tissue Kit or QiaAmp DNA Blood Mini Kit (Qiagen, Hilden, Germany), GenomicPrep™ Blood DNA Isolation Kit (Promega, Madison, Wis.) and GFX™ Genomic Blood DNA Purification Kit (Amersham, Piscataway, N.J.) can be used to obtain DNA from a sample from a subject.

Methylation Detection Techniques of this Example

Various methylation analysis procedures are known in the art, and can be used with the methods described herein. These assays allow for determination of the methylation state of one genomic locus, e.g., one or more CpG sites or islands within a nucleic acid obtained from a sample. In addition, the methods can be used to quantify the methylation of a genomic locus. Such assays involve, among other techniques, DNA sequencing of bisulfite-treated DNA, PCR (for sequence-specific amplification), digital PCR and use of methylation-sensitive restriction enzymes.

In certain embodiments, methylation-specific PCR can be used to determine the methylation status of a genomic loci. Methylation-specific PCR is based on a chemical reaction of sodium bisulfite with DNA that converts unmethylated cytosines, e.g., of CpG dinucleotides, to uracil or UpG, followed by traditional PCR. Methylated cytosines will not be converted in this process, and primers can be designed to overlap the methylation site, e.g., CpG site, of interest, thereby allowing one to determine the methylation status of the methylation site as methylated or unmethylated. Additionally, restriction enzyme digestion of PCR products amplified from bisulfite-converted DNA may be used, e.g., by using the method described by Sadri & Hornsby (Nucl. Acids Res. 1996; 24:5058-5059) or COBRA (Combined Bisulfite Restriction Analysis) (Xiong & Laird, Nucleic Acids Res. 1997; 25:2532-2534).

In certain embodiments, whole genome bisulfite sequencing, which is a high-throughput genome-wide analysis of DNA methylation, can be used to determine the methylation status of multiple genomic loci. It is based on sodium bisulfite conversion of genomic DNA, as described above, which is then sequenced on a next-generation sequencing platform. The sequences obtained are then re-aligned to the reference genome to determine the methylation states of cytosines, e.g., of CpG dinucleotides, present within the analyzed genomic loci based on mismatches resulting from the conversion of unmethylated cytosines into uracil.

In certain embodiments, genome-wide DNA methylation profiling can be performed using commercially-available arrays, thereby allowing the interrogation of multiple genomic loci, e.g., multiple CpG sites. Non-limiting examples of such arrays include HumanMethylation BeadChips (Illumina, San Diego, Calif.) and Infinium MethylationEPIC kit (Illumina). Additional methods for analyzing the methylation state of multiple genomic loci are provided in Yong et al., Epigenetics & Chromatin 2016; 9:26, which is incorporated by reference herein.

Kits of this Example

This Example provides kits for diagnosing, monitoring, classifying, and/or treating a subject with endometriosis. The kits described herein can comprise a means for determining and/or detecting the methylation status of one or more genomic loci.

Kits of this Example can include, without limitation, packaged probe and primer sets (e.g., TaqMan probe/primer sets), arrays/microarrays, which further contain one or more probes, primers, or other detection reagents for determining the methylation state and/or level of one or more genomic loci. For example, a kit described herein can include one or more probes or primers for detecting the methylation state of one or more genomic loci. In certain embodiments, the one or more genomic loci comprise a CpG site. In certain embodiments, one or more of the genomic loci do not comprise a CpG site. For example, about 5% or more, about 10% or more, about 15% or more, about 20% or more, about 25% or more, about 30% or more, about 35% or more, about 40% or more, about 45% or more, about 50% or more, about 55% or more, about 60% or more, about 65% or more or about 70% or more of the one or more genomic loci detected by the primers or probes can comprise one or more CpG sites.

In certain non-limiting embodiments, a primer and/or probe described herein can be at least about 10 nucleotides or at least about 15 nucleotides or at least about 20 nucleotides in length, and/or up to about 200 nucleotides or up to about 150 nucleotides or up to about 100 nucleotides or up to about 75 nucleotides or up to about 50 nucleotides in length.

In a further non-limiting embodiment, the oligonucleotide primers and/or probes can be immobilized on a solid surface or support, for example, on a nucleic acid microarray, wherein the position of each oligonucleotide primer and/or probe bound to the solid surface or support is known and identifiable.

In certain non-limiting embodiments, the kits described herein can additionally include other components such as a buffer, enzymes such as DNA polymerases or ligases, nucleotides such as deoxynucleotide triphosphates, positive control sequences, and/or negative control sequences necessary to carry out an assay or reaction to detect the methylation state of a genomic locus.

In certain embodiments, the kits described herein can include a container comprising one or more probes and/or primers for detecting the methylation state of one or more genomic loci. In certain embodiments, the kits further include instructions for use, e.g., the instructions can describe that a particular methylation status of a genomic locus is indicative of endometriosis in a subject. The instructions can be printed directly on the container (when present), or as a label applied to the container, or as a separate sheet, pamphlet, card or folder supplied in or with the container.

Reports, Programmed Computers, and Systems of this Example

In certain embodiments, a diagnosis and/or monitoring of endometriosis in a subject based on the methylation status of one or more genomic loci as described herein can be referred to herein as a “report.” A tangible report can optionally be generated as part of a testing process (which can be interchangeably referred to herein as “reporting,” or as “providing” a report, “producing” a report or “generating” a report).

Examples of tangible reports can include, without limitation, reports in paper (such as computer-generated printouts of test results) or equivalent formats and reports stored on computer readable medium (such as a CD, USB flash drive or other removable storage device, computer hard drive, or computer network server, etc.). Reports, particularly those stored on computer readable medium, can be part of a database, which can optionally be accessible via the internet (such as a database of patient records or genetic information stored on a computer network server, which can be a “secure database” that has security features that limit access to the report, such as to allow only the patient and the patient's medical practitioners to view the report while preventing other unauthorized individuals from viewing the report, for example). In addition to, or as an alternative to, generating a tangible report, reports can also be displayed on a computer screen (or the display of another electronic device or instrument).

A report can include, for example, an individual's medical history, or can just include size, presence, absence, or levels of one or more markers (for example, a report on computer readable medium such as a network server can include hyperlink(s) to one or more journal publications or websites that describe the medical/biological implications). Thus, for example, the report can include information of medical/biological significance as well as optionally also including information regarding the methylation status of relevant genomic loci, or the report can just include information regarding the methylation status of relevant genomic loci without other medical/biological significance.

A report can further be “transmitted” or “communicated” (these terms can be used herein interchangeably), such as to the individual who was tested, a medical practitioner (e.g., a doctor, nurse, clinical laboratory practitioner, genetic counselor, etc.), a healthcare organization, a clinical laboratory, and/or any other party or requester intended to view or possess the report. The act of “transmitting” or “communicating” a report can be by any means known in the art, based on the format of the report. Furthermore, “transmitting” or “communicating” a report can include delivering a report (“pushing”) and/or retrieving (“pulling”) a report. For example, reports can be transmitted/communicated by various means, including being physically transferred between parties (such as for reports in paper format) such as by being physically delivered from one party to another, or by being transmitted electronically or in signal form (e.g., via e-mail or over the internet, by facsimile, and/or by any wired or wireless communication methods known in the art) such as by being retrieved from a database stored on a computer network server, etc.

In certain embodiments, the disclosed subject matter provides computers (or other apparatus/devices such as biomedical devices or laboratory instrumentation) programmed to carry out the methods described herein, e.g., to perform the algorithm disclosed herein (see Example Embodiment H). In certain embodiments, the system can be controlled by the individual and/or their medical practitioner in that the individual and/or their medical practitioner requests the test, receives the test results back and (optionally) acts on the test results to reduce the individual's endometriosis risk or treat the individual, such as by implementing an endometriosis management system.

Example Embodiment G of Example 4: Discovery of Putative Uterine/Endometrial-Derived Nucleic Acids in Human Plasma Via Epigenomic Liquid Biopsy

Example Embodiment G provides an analysis of epigenomic liquid biopsy of human plasma for the discovery of putative uterine/endometrial-derived cell-free DNA (cfDNA) methylation signatures. Example Embodiment G used solution phase hybridization and high throughput bisulfate sequencing to compare DNA methylation signatures of cfDNA obtained from the plasma samples of women who had previously had a hysterectomy with those from control women who had not previously had a hysterectomy.

A total of n=9 women (the case group) who had previously had a hysterectomy and n=11 women (the control group) who had not previously had a hysterectomy were included. No women in the control group were pregnant at the time of analysis.

Methods

DNA was extracted from plasma volumes ranging from ˜6.3 to 8.1 mL using the NucleoSnap DNA Plasma Kit (Macherey-Nagel) and quantified by Agilent Bioanalyzer. The average yield of cfDNA was 5.625 ng/mL plasma. DNA sequencing libraries were prepared using the Kapa Hyper Prep Kit (Roche). Zymo DNA Methylation Direct Kit was used for bisulfite conversion and targeted libraries were prepared using the SeqCap-Epi (Roche). Libraries were sequenced on an Illumina HiSeq 2500 instrument using 150 bp, paired-end reads.

Reads were trimmed for quality and adaptor sequences using Trim-Galore. The reads were aligned to the human reference sequence (GRCh38/hg20) using Bismark in paired-end and single-end (unaligned paired-end reads), bowtie2 modes. Read duplicates were removed using Bismark. Methylation was called on paired-end and single-end files and then merged. Average on-target coverage was 44.49×.

CpG methylation calls with read depth of at least 10× were read into MethylSig for differential methylation analysis. MethylKit was used for sliding window analysis with a window size of 250 bp and step size of 50 bp. Differentially methylated regions were identified according to (insert paper reference).

Results

Analysis of plasma epigenomic liquid biopsy data using MethylSig revealed 42,403 significant CpG sites that were differentially methylated between cases and controls. Similarly, analysis of the same plasma epigenomic liquid biopsy data using MethylKit revealed 16,562 significant 250 bp sliding windows (identified by Bonferroni corrected p-value and a confidence interval methylation difference of at least 5%) that were differentially methylated between cases and controls.

To further explore the data, Example Embodiment G used the Genotype-Tissue Expression (GTEx) project database. GTEx is an on-going effort to build a comprehensive public resource to study tissue-specific gene expression and regulation. Example Embodiment G identified genes whose expressions are low in whole blood and comparatively highly expressed in uterine tissues. It was rationalized that leukocytes are a major (though not exclusive) contributor to cfDNA in plasma. Furthermore, the uterus is likely to be the sole contributor of uterine/endometrial-derived cfDNA in plasma. Thus, the identification of tissue-specific differences in gene expression in this context, was carried out to illuminate the DNA methylation data and assist in the discovery of putative uterine/endometrial-derived cfDNA in plasma. Specifically, genes were identified whose expressions were elevated by a minimum of 5-fold in uterus compared to whole blood and whose expressions were below two transcripts per million (TPM) (from GTEx). These were then merged with the list of n=16,562 significant CpGs identified via sliding windows analysis (see above). This identified 3,538 significantly differentially methylated windows (not shown) located within and adjacent to the tissue-specific differentially expressed genes. The top n=30 most significantly differentially methylated windows from this output are listed in Table 11. This list of CpG sites represents examples of specific DNA methylation differences that may be detected in the relevant biological samples listed herein, demonstrating that DNA methylation differences such as those listed and many others exist and can be used to differentiate samples as described herein, for example, as described using procedures similar to those set forth in Example Embodiment H.

TABLE 11 Fold Chromsome Coordinate methControl methCase meth.diff pvalue Change chr19 5229601 15.04% 30.22% 15.18% 1.12E−131 41.3317959 chr19 5229551 12.48% 27.81% 15.33% 1.31E−121 41.3317959 chr19 5229501 11.21% 27.39% 16.17% 9.03E−121 41.3317959 chr19 5229451 11.38% 30.18% 18.80% 3.98E−116 41.3317959 chr19 719301 22.35% 41.33% 18.97% 2.94E−114 54.1302115 chr1 240492801 35.76% 49.22% 13.47% 4.27E−114 7.42897026 chr19 719251 22.30% 39.77% 17.46% 7.39E−114 54.1302115 chr10 133529601  6.01% 12.98%  6.97% 1.57E−113 5.95805128 chr10 133529551  6.74% 13.90%  7.16% 6.14E−113 5.95805128 chr1 240492851 35.08% 47.05% 11.97% 3.27E−109 7.42897026 chr19 5229651 16.64% 31.59% 14.96% 8.96E−108 41.3317959 chr10 133529651  6.16% 13.22%  7.06% 6.55E−107 5.95805128 chr1 240492751 36.14% 49.74% 13.60% 8.86E−107 7.42897026 chr1 240492901 35.19% 46.74% 11.55% 1.43E−102 7.42897026 chr10 789451 15.66% 30.20% 14.54% 1.75E−102 26.0717831 chr6 37650051 32.66% 43.72% 11.06% 1.85E−102 5.70832736 chr19 719201 22.10% 38.04% 15.94% 6.21E−97  54.1302115 chr10 133529501  7.35% 14.96%  7.62% 5.78E−96  5.95805128 chr1 201650401 42.76% 29.94% −12.82%  9.46E−96  9.64816418 chr6 37648901  6.77% 14.26%  7.49% 3.13E−94  5.70832736 chr1 201650451 44.10% 32.00% −12.10%  6.68E−92  9.64816418 chr11 67071551 17.15% 27.45% 10.30% 4.55E−90  21.1637777 chr19 5229401 12.57% 33.69% 21.13% 9.26E−90  41.3317959 chr3 75396351 52.34% 64.56% 12.22% 1.40E−88  6.87247351 chr19 719351 28.35% 48.18% 19.83% 2.46E−88  54.1302115 chr6 37650001 33.33% 44.40% 11.07% 1.49E−87  5.70832736 chr6 37648951  6.55% 13.54%  6.99% 2.08E−87  5.70832736 chr10 133528001  8.42% 17.33%  8.91% 2.04E−86  5.95805128 chr1 201650351 40.04% 26.56% −13.48%  6.13E−86  9.64816418 chr3 75396251 48.26% 58.90% 10.64% 2.97E−85  6.87247351

Discussion of this Example

This example demonstrates a comprehensive quantitative genome-wide analysis of DNA methylation in human plasma from women who have previously had a hysterectomy and women who have not previously had a hysterectomy. These results demonstrate that epigenomic liquid biopsy of human plasma can be used for quantitative analysis of cfDNA methylation and, in this example, the discovery of putative uterine/endometrial-derived DNA methylation signatures.

Example Embodiment H of Example 4: Estimation of Abnormal Plasma Methylome Variation in Targeted Regions for Diagnosis of Endometriosis

Provided below is an algorithm that can be used to diagnose a subject with endometriosis. The presently disclosed subject matter provides that the methylome(s) of uterus of a mammal, or structures therein (e.g., endometrium), could be affected by certain abnormalities (e.g., endometriosis), and that the changes of these methylomes can lead to changes in the methylation patterns of the DNA fragments found in plasma, which are released by uterus/endometrium-derived tissues. An algorithm was developed to identify the changes of methylation patterns in the methylome of plasma caused by uterine/endometrial phenotypes.

The main insight behind this algorithm was that the methylome of the DNA fragments in plasma is a mixture of a variety of component methylomes of uterine/endometrial origin, and that the proportion of these different component methylomes in the mixture varies from subject to subject, even among the population with normal uterine/endometrial phenotypes. By constructing a model of plasma methylome as a linear combination of various component methylomes of uterine/endometrial and other origins, the algorithm can accurately predict the methylation patterns of a new plasma sample under the hypothesis that it is from a normal individual.

Consequently, the algorithm of this Example has high sensitivity for detecting abnormal methylation patterns in a plasma sample caused by changes of the methylomes of some uterine/endometrial or other relevant tissues when the sample is from an affected individual.

The procedure can be applied with little modification to the diagnosis and phenotyping of endometriosis using other types of biopsy samples, such as cervical swabs, vaginal swabs, urine, and tampon blood, provided that the DNA fragments from the tissues affected by endometriosis can be found in those biopsy samples.

Let i be any CpG site in human genome, z_(i,j) be the methylation level of CpG site i in a plasma sample j, p_(i,r,j) be the proportion of the r^(th) component methylome m_(r,j) of ovarian origin in plasma sample j at site i, m_(i,r,j) be the methylation level of CpG i in methylome m_(r,j). The hypothesis is:

Z_(i,j)=Σ_(r=1) ^(R)p_(i,r,j)m_(i,r,j)  (1)

where p_(I,r,j), m_(I,r,j)>=0, m_(I,r,j)<=1, p_(I,1,j)+ . . . +p_(I,R,j)=1.

It is further assumed that there is a set of CpG sites S such that, for any CpG site i in S, and any plasma j from a normal individual, it has m_(I,r,j)=m_(I,r) and p_(I,r,j)=p_(r,j).

That is, it is assumed that in any plasma sample from a normal individual, the proportions of different component methylomes in the mixture are the same for all CpG sites in S. It is also assumed that, by restricting to the set of CpG sites S, plasma samples from all normal individuals have the same set of component methylomes. They are called restricted reference component methylomes (RRCM), and are labeled as m₁ ^(S), . . . , m_(R) ^(S) or simply m₁, . . . , m_(R) when there is no confusion. For any plasma sample j from a normal individual, its methylome restricted to set of CpG sites in S can be expressed as a weighted average of the restricted reference component methylomes. More precisely, let z_(i) ^(s) be the methylome of plasma sample C restricted to S, then for some mixture vector p_(j)=[p_(j,l) . . . , p_(j,R)]^(T), it has:

z_(j) ^(s)=[m₁ ^(S), . . . ,m_(R) ^(S)]p_(j)  (2)

Finally, it is assumed that the set S is the union of two disjoint subsets C and T, where T is a union of K non-empty sets T_(k) such that T=U_(k=1) ^(K)T_(k) where the index k represents the k^(th) type of abnormal uterine/endometrial phenotype. T_(k)'s do not need to be disjoint. Moreover, T_(k) itself is the union of two disjoint sets D_(k) and V_(k). Either D_(k) or V_(k) could be empty, but not both. It is assumed that for any plasma sample, including one from an abnormal individual, when restricted to CpG sites in C, its methylome can always be expressed as a weighted average of the restricted reference component methylomes. That is, it has: z_(j) ^(C)=[m₁ ^(C), . . . , m_(R) ^(C)]p_(j) regardless whether j is from an abnormal individual. C is called the set of reference CpG sites. On the other hand, for a plasma sample l from an abnormal individual, when restricted to CpG sites in S=CUT, its methylome can no longer be expressed as a weighted average of the restricted reference component methylomes. That is, it has: w₁ ^(S)≠[m₁ ^(S), . . . , m_(R) ^(S)]p_(l) for any mixture vector p_(l). More specifically, for a plasma sample l from an individual with the k^(th) type of abnormal phenotype, it has: 1), w_(j) ^(C)=[m₁ ^(C), . . . , m_(R) ^(C)]p_(l), 2), if D_(K) is non-empty, then w_(l) _(D) _(K)=[m_(1,k) ^(D) ^(k) , . . . , m_(R,k) ^(D) ^(k) ]p_(l) such that [m₁ ^(D) ^(k) , . . . , m_(R) ^(D) ^(k) ]≠[m_(1,k) ^(D) ^(k) , . . . , m_(R,k) ^(D) ^(k) ], and 3), if V_(k) is non-empty, then w_(l) ^(V) ^(k) =[m₁ ^(V) ^(k) , . . . , m_(R) ^(V) ^(k) ]q_(l) such that p_(l)≠q_(l). In other words, in a plasma sample from the k^(th) type of abnormal individual, if the set D_(k) is not empty, the component methylomes of the sample l restricted to D_(k) are no longer the same as the reference component methylome restricted to D_(k). If the set V_(k) is not empty, in this plasma sample, the proportion of the reference component methylomes restricted to V_(k) is no longer the same as the proportion of the reference component methylome restricted to R.

T is called the target set of CpG sites, D_(k) is called the differential methylation target set, V_(k) is called the copy number variation target set, and T_(k) is called the target set for the k^(th) type of abnormal phenotype.

The main steps of the algorithm of this Example are:

-   -   1) Identify the sets of reference CpG sites C, and T₁, . . . ,         T_(K) for the list of K types of abnormal individuals.     -   2) Estimate the restricted reference component methylomes m₁, .         . . , m_(R), or R predictor methylomes n₁, . . . , n_(R) that         are independent linear combinations of the reference component         methylomes such that n_(r)=[m₁, . . . , m_(R)]q_(r) for R         linearly independent mixture vectors q₁, . . . , q_(R).     -   3) (Optional) If the reference component methylomes are         available, estimate the proportions of these components at the         reference CpG sites C for the test plasma samples.     -   4) Predict the methylation level of the test plasma samples at         the target set T_(k) of CpG sites, under the hypothesis that the         sample is from a normal individual.     -   5) Compare the predicted methylation levels at D_(k) and V_(k)         against the observed methylation levels, and reject the null         hypothesis that a test sample is from a normal individual if the         observed methylation levels are significantly different form the         predicted levels.

The algorithm of this Example can be implemented in a variety of ways. For example, given the methyl-seq data for a set of plasma samples from normal individuals, the presently disclosed EM algorithm or the data augmentation method can be applied to estimate the component methylomes, then use the maximum likelihood method to estimate the proportion of these component methylomes in the test sample. Below are exemplary simple implementations of the presently disclosed algorithm that use linear regression.

In the simplest implementation of the algorithm of this Example, it is assumed the restricted methylome of a plasma sample from a normal individual can be approximated by a mixture of two restricted reference methylomes. It is further assumed that the estimations of these two reference component methylomes are available. For example, in the implementation below, for the genomic loci of interest, the plasma methylome is approximated by the mixture of leukocyte and uterine/endometrial- or other relevant tissue/cell-derived methylomes. The implementation of the algorithm includes the following steps:

1. Identify the Reference Set C, and the Target Sets T₁, . . . , T_(K).

-   -   1.1 Collect the methylation data for a set of leukocyte samples,         a set of uterine/endometrial- or other relevant         tissue/cell-derived samples, and a set of plasma samples, all         from normal individuals. For each type of abnormal individuals,         collect a set of leukocyte-derived samples, a set of         uterine/endometrial- or other relevant tissue/cell-derived         samples, and a set of plasma samples from that type of abnormal         individuals. All these samples should have matched age, race,         and other relevant parameters. These are the training data.     -   1.2 Let x_(i,j) be the observed methylation level of CpG site i         in a normal leukocyte-derived sample j, and y_(i,l) the observed         methylation level of CpG site i in a normal uterine/endometrial-         or other relevant tissue/cell-derived sample l, s_(x,i) ² the         sample variance of x_(i,j) over all normal leukocyte-derived         samples, s_(y,i) ² the sample variance of y_(i,j) over all         normal uterine/endometrial- or other relevant         tissue/cell-derived samples. Identify the CpG sites S₀ such that         for any i∈S₀, it has both s_(x,i) ²<c₀ and s_(y,i) ²<c₀ for some         constant c₀. These are CpG sites with stable methylation levels         in each type of normal cells.     -   1.3 Let x_(i,j) be the observed methylation level of CpG site i         in a leukocyte-derived sample j, including normal and abnormal,         and y₀ the observed methylation level of CpG site i in a         uterine/endometrial- or other relevant tissue/cell-derived         sample 1, including normal and abnormal, s_(x,i) ² the sample         variance of x_(i) over all leukocyte-derived samples, including         normal and abnormal, s_(y,i) ² the sample variance of y_(i,j)         over all uterine/endometrial- or other relevant         tissue/cell-derived samples, including normal and abnormal.         Identify the CpG sites S₁ such that for any i∈S_(i), it has both         s_(x,i) ²<c₀ and s_(y,i) ²<c₀ for some constant c₀, and that the         statistical test for the difference between {x_(i,j0): j0 is a         normal leukocyte—derived sample} and {x_(i,jk): jk is an         abnormal leukocyte—derived sample of type k} is not significant         for all abnormal types of leukocyte-derived, and that the         statistical test for the difference between {y_(i,j0): j0 is a         normal uterine/endometrial—or other relevant tissue/cell—derived         sample} and {y_(i,jk): jk is an abnormal uterine/endometrial—or         other relevant tissue/cell—derived sample of type k} is not         significant for all abnormal types of uterine/endometrial- or         other relevant tissue/cell-derived sample. These are CpG sites         with stable methylation levels in each type of cells, and with         no difference in methylation level between normal and any         abnormal samples. Let x_(i) be the sample mean of x_(i,j) over         all leukocyte-derived samples, including normal and abnormal,         y_(i) the sample mean of y_(i,j) over all uterine/endometrial-         or other relevant tissue/cell-derived samples, including normal         and abnormal. Identify the subset C₀ of S₁ such that for any         i∈C₀, it has |x_(i)−y_(i)|>c₁ for some constant c₁. These are         CpG sites that are stably methylated in each cell type, with no         difference between the normal and abnormal samples of the same         cell type, and differentially methylated between different types         of cells.     -   1.4 Let x^(R) ⁰ be the vector of x_(i) for all i∈C₀, and y^(C) ⁰         be the vector of y_(i) for all i∈C₀, where x_(i) is the mean         methylation at site i in all leukocyte-derived samples y_(i) the         mean methylation at site i in all uterine/endometrial- or other         relevant tissue/cell-derived samples. Note that by the way the         set C₀ is selected, there is no difference in the methylation         level of any CpG sites in C₀ between normal and abnormal         leukocyte-derived samples, or between normal and abnormal         uterine/endometrial- or other relevant tissue/cell-derived         samples. Let z_(j) ^(C) ⁰ be the observed methylation levels of         CpG sites in C₀ for a plasma sample j of the k^(th) abnormal         type. (For convenience, the normal plasma sample is called as         sample of the 0th abnormal type). For each sample j belonging to         the k^(th) abnormal type, regress z_(j) ^(C) ⁰ against x^(C) ⁰         and y^(C) ⁰ , with the constraints that the intercept must be 0,         and the coefficients must be non-negative and add to 1, and get         the residual e_(j) ^(C) ⁰ . Identify the subset C₀ ^(k) of C₀         such that for any CpG i in C₀ ^(k), it has

${\frac{e_{i,k}^{2}}{s_{i,k}} < c_{2}},$

-   -   and e_(i,k) ²<c₃ for some constants c₂ and c₃, where e_(i,k) ²         is the mean of the squared difference between estimated and         observed methylation levels of CpG site i in all plasma samples         of the k^(th) abnormal type, and s_(i,k) ² the sample variances         of methylation levels of CpG site i in the same set of plasma         samples. Repeat the above procedure for each type of abnormal         plasma samples, the intersection of the subsets C=∩_(k=0) ^(K)         C₀ ^(k) is the reference set of CpG sites. These are CpG sites         where their methylation levels in both normal and any type of         abnormal plasma samples can be accurately predicted by the         reference component methylomes from normal individuals.     -   1.5 Let T₀=S₀\ S₁. Let x^(C) and x^(T) ⁰ be the vectors of x_(i)         and x_(h) for all i∈C and h∈T₀ respectively, and y^(C) and y^(T)         ⁰ be the vectors of y_(i) and y_(h) for all i∈C and h∈T₀         respectively, where x_(i), x_(h), y_(i), and y_(h) are mean         methylation level of sites for a normal leukocyte-derived or         uterine/endometrial- or other relevant tissue/cell-derived         sample at sites i and h respectively. Let z_(j) ^(C) and z_(j)         ^(T) ⁰ and be the observed methylation levels of CpG sites in C         and T₀ respectively for a normal plasma sample j, w_(l) _(k)         ^(C) and w_(l) _(k) ^(T) ⁰ the observed methylation level of CpG         sites in C and T₀ respectively for a plasma sample l_(k) from an         individual with the k^(th) type of abnormality, w_(l) _(g) ^(C)         and w_(l) _(g) ^(T) ⁰ the observed methylation level of CpG         sites in C and T₀ respectively for a plasma sample l_(g) from an         individual with the g^(th) type of abnormality, where g≠k. For         each j, l_(k), and l_(g), regress z_(j) ^(C), w_(l) _(k) ^(C),         and w_(l) _(g) ^(C) respectively against x^(C) and y^(C), with         the constraints that the intercept must be 0, and the         coefficients must be non-negative and add to 1. Apply the fitted         models respectively to x^(T) ⁰ and y^(T) ⁰ to predict z_(j) ^(T)         ⁰ , w_(l) _(k) ^(T) ⁰ and w_(l) _(g) ^(T) ⁰ respectively, and         get the differences e_(j) ^(T) ⁰ , e_(l) _(k) ^(T) ⁰ and e_(l)         _(g) ^(T) ⁰ between the predicted values and observed values.         Let e_(i), e_(i,k), and e_(i,g) be the means of the sets of         differences {e_(j) ^(T) ⁰ : j is a normal plasma sample}, {e_(l)         _(k) ^(T) ⁰ : l_(k) is a plasma sample of th k^(th) abnormal         type} and {e_(l) _(g) ^(T) ⁰ : l_(g) is a plasma sample of the         g^(th) abnormal type} for CpG site i respectively. Identify the         subset T_(k) of T₀ such that for any i∈T_(k), it has         |e_(i)|<c_(2,0), |e_(i,k)|>c_(2,k), and         |e_(i,k)−e_(i,g)|>c_(3,k), for some constants c_(2,0), C_(2,k),         and C_(3,k), for all g≠k. T_(k) is the target set for the k^(th)         type of the abnormal individual. These are the sites where the         methylation of a normal plasma sample can be accurately         predicted, the observed methylation in a plasma sample of the         k^(th) abnormal type will deviate from the prediction, and         deviation will be different from that of a plasma sample of any         other abnormal type.

2. Estimate Fraction of the New Stool Samples to be Tested

Recall that x^(c) and y^(c) are mean vectors of the methylation levels of the training leukocyte-derived and training uterine/endometrial- or other relevant tissue/cell-derived sample data for the CpG sites in the reference set C. For any new plasma sample t to be tested, let z_(t) ^(C) be the observed methylation levels of CpG sites in C. Regress z_(t) ^(C) against x^(C) and y^(C), with the constraints that the intercept must be 0, and the coefficients must be non-negative and add to 1. The estimated coefficients are the estimated fractions of the component methylomes for the plasma sample t.

3. Test if the New Plasma Samples are from the k^(th) Type of Abnormal Individual.

For the new plasma sample t, let x^(T) ^(k) and y^(T) ^(k) be mean vectors of the methylation levels of the training leukocyte-derived and training uterine/endometrial- or other relevant tissue/cell-derived sample data for the CpG sites in the target set T_(k) identified in step 1 of this algorithm, apply the fitted regression models obtained from the step 2 of this algorithm to x^(T) ^(k) and y^(T) ^(k) to predict the methylation levels of CpG sites in T_(k) for sample t under the hypothesis that sample t is from a normal. Let n_(k) be the number of CpG sites in T_(k). Define functions f_(k)(x₁, . . . , x_(n) _(k) )=Σ_(i)(−1)^(I_(e) ^(i,k) ^(−e) ^(i) ⁾x_(i) and f_(k,g)(x₁, . . . , x_(n) _(k) )=Σ_(i)(−1)^(I_(e) ^(i,k) ^(−e) ^(i,g) ⁾x_(i), where I_(⋅)=I_((−∞,0))(⋅), that is, the indicator function for the interval (−∞, 0), e_(i), e_(i,k) and e_(i,g) are estimations obtained from step 1.5 of the algorithm. It will be said the sample is from an individual with the k^(th) type of abnormal phenotype if f_(k)(e_(1,t)−e₁, . . . , e_(n) _(k) _(,t)−e_(n) _(k) )>c_(4,k), and f_(k,g)(e_(1,t)−e_(1,g), . . . , e_(n) _(k) _(,t)−e_(n) _(k) _(,g))>c_(5,g) for all g k, where e_(i,t) is the difference between the observed methylation level of the CpG site i∈T_(k) for sample t and the predicted value by the fitted model obtained from step 2, and g≠k is any type of abnormal phenotype that is different form the k^(th) type of abnormal phenotype.

Other ways of implementing the algorithm of this Example can be developed by modifying the simple implementation presented above. Specifically, it does not need to assume that there are only two component reference methylomes that make up the plasma methylomes, nor does it need to approximate them by mixtures of the component methylomes. Instead, a set of predictor methylomes can be collected that are themselves mixtures of component reference genomes, as long as the number of the predictor methylomes is the same as the number of the reference component methylomes, and the mixture vectors of the predictor methylomes are linearly independent. For example, they can be methylomes of plasma samples with known different proportion of leukocyte-derived and uterine/endometrial- or other relevant tissue/cell-derived sample DNAs.

In the algorithm of this Example, the difference between observed methylation levels in certain target regions and the predicted methylation levels as the test statistic to determine if in a plasma sample the methylome has been affected by endometriosis. To illustrate the advantage of this approach, it is assumed that the mixture vector p_(i) for the methylome of a normal plasma sample j followed a Dirichlet's distribution with parameters α₁= . . . =α_(R). Furthermore, for CpG site i, its methylation levels in the R reference vector p_(i) for component methylomes are m_(i,r)=(r−1)/(R−1). It can be shown that the methylation level of i in sample j then has a mean of 0.5, and a variance of

$\frac{R + 1}{12\left( {R - 1} \right)\left( {{R\;\alpha_{1}} + 1} \right)}.$

If there is a methyl-seq library of sample j with a coverage of N for CpG site i, the variance of the measured methylation level z_(i,j) is

$\sigma_{1}^{2} = {\frac{1}{4N} + {\frac{N - 1}{N}{\frac{R + 1}{12\left( {R - 1} \right)\left( {{R\;\alpha_{1}} + 1} \right)}.}}}$

In other words, if z_(i,j) is used as a test statistic to detect and phenotype endometriosis using a plasma sample, under the null hypothesis, the test statistic has a variance of σ₁ ². However, in the presently disclosed algorithm, it is first estimated the mixture vector p_(j), then predicted z_(i,j) by Σ_(r)m_(i,r)p_(r,j). Note that in methyl-seq data, millions of CpG sites can be contained in each library, and that the variance of the coefficients in a linear regression model is inversely proportional to sample size. Thus it is possible to get highly accurate estimation of the mixture vector p_(i), even if it is taken into account that adjacent CpG sites tend to have correlated methylation levels. Assuming an accurate estimate of Σ_(r)m_(i,r)p_(r,j) can be obtained, that is, the error of the estimation can be ignored, the variance of the difference z_(i,j)−Σ_(r)m_(i,r)p_(r,j) between the observed methylation level and the prediction will be

$\frac{1}{4N} - {\frac{1}{N}{\frac{R + 1}{12\left( {R - 1} \right)\left( {{R\;\alpha_{1}} + 1} \right)}.}}$

In other words, under the null hypothesis, the test static z_(i,j)−Σ_(r)M_(i,r) p_(r,j) used in the presently disclosed algorithm has a much smaller variance than the other candidate test statistic z_(i,j). This in turns means that the presently disclosed test will achieve a higher power at the same level of type I error.

Examples of Embodiments for Example 4

D1. A method for diagnosing, prognosing, classifying, and/or monitoring endometriosis in a mammal, comprising:

(a) obtaining a sample from the mammal;

(b) determining the methylation status and/or level of one or more genomic loci in the sample;

(c) comparing the methylation status and/or level of the one or more genomic loci to a reference or a set of predicted values; and

(d) diagnosing endometriosis in the mammal,

wherein the difference in the methylation status and/or level of the one or more genomic loci in the sample compared to the reference or a set of predicted values indicates the presence of endometriosis in the mammal.

D2. The method of embodiment D1, wherein an increase in the level of methylation of the one or more genomic loci in the sample indicates the presence of endometriosis in the mammal or a decrease in the level of methylation of the one or more genomic loci in the sample indicates the presence of endometriosis in the mammal. D3. The method of embodiment D1, wherein a decrease in the level of methylation of at least one of the one or more genomic loci in the sample and an increase in the level of methylation of at least one of the one or more genomic loci in the sample indicates the presence of endometriosis in the mammal. D4. A method of treating endometriosis in a mammal, comprising:

(a) obtaining a sample from the mammal;

(b) determining the methylation status and/or level of one or more genomic loci present in the sample;

(c) comparing the methylation status and/or level of the one or more genomic loci to a reference or a set of predicted values;

(d) diagnosing endometriosis in the mammal, wherein the difference in the methylation status and/or level of the one or more genomic loci in the sample compared to the reference indicates the presence of endometriosis in the mammal; and

(e) administering a hormone therapy, a pain medication, or both to said mammal.

D5. The method of any one of embodiments D1-D4, wherein the reference is the methylation status and/or level of the one or more genomic loci in a sample obtained from a mammal that does not have endometriosis. D6. The method of any one of embodiments D1-D5, wherein said sample is a plasma sample. D7. A method of treating endometriosis comprising:

(a) measuring the methylation status and/or level of one or more genomic loci present in a sample from a mammal prior to a treatment of endometriosis;

(b) measuring the methylation status and/or level of one or more genomic loci present in a sample from the mammal during the treatment of endometriosis; and

(c) continuing the treatment if the difference in the methylation status and/or level of the one or more genomic loci between the samples from prior to and during the treatment of endometriosis indicates the mammal is responsive to the treatment.

D8. The method of embodiment D7, further comprising (d) administering a different treatment to the mammal if the difference in the methylation status and/or level of the one or more genomic loci between the samples from prior to and during the treatment of endometriosis indicates the mammal is not responsive to the treatment. D9. The method of embodiment D7, wherein an increase in the level of methylation of the one or more genomic loci in the sample indicates the mammal is responsive to the treatment, or a decrease in the level of methylation of the one or more genomic loci in the sample indicates the mammal is responsive to the treatment. D10. The method of embodiment D7, wherein a decrease in the level of methylation of at least one of the one or more genomic loci in the sample and an increase in the level of methylation of at least one of the one or more genomic loci in the sample indicates the mammal is responsive to the treatment. D11. The method of embodiment D7, wherein an increase in the level of methylation of the one or more genomic loci in the sample indicates the mammal is not responsive to the treatment, or a decrease in the level of methylation of the one or more genomic loci in the sample indicates the mammal is not responsive to the treatment. D12. The method of embodiment D7, wherein a decrease in the level of methylation of at least one of the one or more genomic loci in the sample and an increase in the level of methylation of at least one of the one or more genomic loci in the sample indicates the mammal is not responsive to the treatment. D13. The method of any one of embodiments D7-D12, wherein said sample is a plasma sample. D14. The method of any one of embodiments D1-D13, wherein the one or more genomic loci comprise one or more CpG sites. D15. The method of any one of embodiments D1-D14, wherein the one or more genomic loci are present within nucleic acids isolated from the sample. D16. The method of any one of embodiments D1-D15, wherein the one or more genomic loci are present within cell-free nucleic acids isolated from the sample. D17. A method of treating endometriosis, comprising;

(a) diagnosing endometriosis in a mammal by utilization of the algorithm disclosed in Example Embodiment H; and

(b) administering a hormone therapy, a pain medication, or both to said mammal to treat said endometriosis.

D18. The method of any one of embodiments D1-D17, wherein said mammal is a human. D19. A kit for diagnosing, prognosing, and/or monitoring endometriosis in a mammal comprising a means for determining and/or detecting the methylation status of one or more genomic loci. D20. The kit of embodiment D19, wherein the means comprises one or more primers and/or probes for determining and/or detecting the methylation status of the one or more genomic loci.

Abstract of this Example (Example 4)

This Example provides methods for diagnosing, prognosing, monitoring, classifying, and/or treating endometriosis. For example, algorithms, kits, and methods for diagnosing, prognosing, monitoring, classifying, and/or treating endometriosis are provided.

Example 5—Methods and Materials for Assessing and Treating Necrotizing Enterocolitis Field of this Example

This Example relates document relates to methods and materials involved assessing a mammal (e.g., a human infant) for and/or treating a mammal (e.g., human infant) having or developing necrotizing enterocolitis. For example, this Example provides methods and materials for using DNA methylation profiles of nucleic acid within a biopsy (e.g., a plasma sample, a blood sample, or a stool sample) to determine whether or not a mammal has, or is developing, necrotizing enterocolitis. This Example also provides methods, algorithms, and kits for diagnosing, prognosing, monitoring, classifying, and/or treating necrotizing enterocolitis.

Background of this Example

Necrotizing enterocolitis affects low birth weight infants in the first weeks of life with a reported frequency of between 1 and 5 percent of neonatal intensive care unit admissions with mortality rates of between 15 and 30 percent. Necrotizing enterocolitis currently is suspected through the visualization of distended abdomen and confirmed via x-ray. By the time final diagnosis occurs through this process, however, the disease has already progressed to the point that typically requires surgical intervention, which carries a 50 percent mortality rate.

Summary of this Example

This Example provides methods and materials involved in assessing a mammal (e.g., a human) for and/or treating a mammal (e.g., human) having or developing necrotizing enterocolitis. For example, this Example provides methods and materials for using DNA methylation profiles of nucleic acid within a biopsy (e.g., a plasma sample, a blood sample, or a stool sample) to determine whether or not a mammal has, or is developing, necrotizing enterocolitis. Determining if a mammal (e.g., a human) has, or is likely to develop, necrotizing enterocolitis by assessing DNA methylation profiles of nucleic acid within a biopsy (e.g., a plasma sample, a blood sample, or a stool sample) can aid in the identification of mammals (e.g., humans) that should be treated in a particular manner (e.g., by administering an antibiotic therapy, by feeding intravenously as opposed to by mouth, and/or by performing a surgical treatment), for example, early in the disease process.

This Example also provides methods for diagnosing, prognosing, monitoring, classifying, and/or treating necrotizing enterocolitis. For example, the methods described herein can include determining the methylation status of one or more genomic loci in a sample (e.g., a plasma sample, blood sample, or a stool sample) of a mammal (e.g., a human infant). This Example further provides algorithms and kits for diagnosing, prognosing, monitoring, classifying, and/or treating necrotizing enterocolitis.

In one aspect, the present disclosure provides a method for diagnosing, prognosing, classifying, and/or monitoring necrotizing enterocolitis in a mammal (e.g., a human infant) comprising: (a) obtaining a sample (e.g., a plasma sample, a blood sample, or a stool sample) from the mammal; (b) determining the methylation status and/or level of one or more genomic loci in the sample; (c) comparing the methylation status and/or level of the one or more genomic loci to a reference or a set of predicted values; and (d) identifying the mammal as having necrotizing enterocolitis, wherein the difference in the methylation status and/or level of the one or more genomic loci in the sample compared to the reference or a set of predicted values indicates the presence of necrotizing enterocolitis in the mammal.

In another aspect, this Example provides a method of treating necrotizing enterocolitis in a mammal (e.g., a human infant) comprising: (a) obtaining a sample (e.g., a plasma sample, a blood sample, or a stool sample) from the mammal; (b) determining the methylation status and/or level of one or more genomic loci present in the sample; (c) comparing the methylation status and/or level of the one or more genomic loci to a reference or a set of predicted values; (d) identifying the mammal as having necrotizing enterocolitis, wherein the difference in the methylation status and/or level of the one or more genomic loci in the sample compared to the reference or a set of predicted values indicates the presence of necrotizing enterocolitis in the mammal; and (e) treating the mammal by administering an antibiotic therapy, by feeding intravenously as opposed to by mouth, and/or by performing a surgical treatment.

In some cases, an increase in the level of methylation of the one or more genomic loci in a sample (e.g., a plasma sample, a blood sample, or a stool sample) can indicate the presence of necrotizing enterocolitis in the mammal or a decrease in the level of methylation of the one or more genomic loci in a sample (e.g., a plasma sample, a blood sample, or a stool sample) indicates the presence of the necrotizing enterocolitis in the mammal. In some cases, a decrease in the level of methylation of at least one of the one or more genomic loci in a sample and an increase in the level of methylation of at least one of the one or more genomic loci in the sample can indicate the presence of necrotizing enterocolitis in the mammal.

In some cases, the reference can be the methylation status and/or level of the one or more genomic loci in a sample (e.g., a plasma sample, a blood sample, or a stool sample) obtained from a mammal (e.g., a human infant) that does not have necrotizing enterocolitis.

In another aspect, this Example provides a method of treating necrotizing enterocolitis in a mammal (e.g., a human infant) comprising: (a) measuring the methylation status and/or level of one or more genomic loci present in a sample (e.g., a plasma sample, a blood sample, or a stool sample) from the mammal prior to a treatment of necrotizing enterocolitis; (b) measuring the methylation status and/or level of one or more genomic loci present in a sample (e.g., a plasma sample, a blood sample, or a stool sample) from the mammal during the treatment of necrotizing enterocolitis; and (c) continuing the treatment if the difference in the methylation status and/or level of the one or more genomic loci between the samples from prior to and during the treatment of necrotizing enterocolitis indicates the subject is responsive to the treatment. In some cases, the method further comprises (d) administering a different treatment to the mammal if the difference in the methylation status and/or level of the one or more genomic loci between the samples from prior to and during the treatment of necrotizing enterocolitis indicates the subject is not responsive to the treatment.

In some cases, an increase in the level of methylation of the one or more genomic loci in the sample indicates that the mammal is responsive to the treatment, or a decrease in the level of methylation of the one or more genomic loci in the sample indicates the mammal is responsive to the treatment. In certain embodiments, a decrease in the level of methylation of at least one of the one or more genomic loci in the sample and an increase in the level of methylation of at least one of the one or more genomic loci in the sample indicates the mammal is responsive to the treatment. In certain embodiments, an increase in the level of methylation of the one or more genomic loci in the sample indicates the mammal is responsive to the treatment, or a decrease in the level of methylation of the one or more genomic loci in the sample indicates the mammal is not responsive to the treatment. In certain embodiments, a decrease in the level of methylation of at least one of the one or more genomic loci in the sample and an increase in the level of methylation of at least one of the one or more genomic loci in the sample indicates the subject is not responsive to the treatment.

This Example further provides algorithms for diagnosing and/or monitoring a mammal having necrotizing enterocolitis. In certain embodiments, the algorithm can be used to classify necrotizing enterocolitis of a mammal (e.g., a human infant).

In another aspect, this Example provides a kit for diagnosing, prognosing, and/or monitoring necrotizing enterocolitis in a mammal comprising a means for determining and/or detecting the methylation status of one or more genomic loci. In certain embodiments, the means comprises one or more primers and/or probes for determining and/or detecting the methylation status of the one or more genomic loci.

In certain embodiments, the one or more genomic loci are present within nucleic acids isolated from the sample. In certain embodiments, the one or more genomic loci are present within cell-free nucleic acids isolated from the sample.

Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention pertains. Although methods and materials similar or equivalent to those described herein can be used in the practice or testing of the present invention, suitable methods and materials are described below. All publications, patent applications, patents, and other references mentioned herein are incorporated by reference in their entirety. In case of conflict, the present specification, including definitions, will control. In addition, the materials, methods, and examples are illustrative only and not intended to be limiting.

Other features and advantages of the invention will be apparent from the following detailed description, and from the claims.

Description of this Example

This Example provides methods for diagnosing, prognosing, monitoring, classifying, and/or treating necrotizing enterocolitis. For example, the methods described herein can include determining the methylation status of one or more genomic loci in a sample of a mammal (e.g., a human infant). In some cases, the methods described herein can include the use of an algorithm to diagnose, prognose, monitor, classify, and/or assist in the treatment of necrotizing enterocolitis. Non-limiting embodiments of this Example are described by the present specification and Examples.

Unless defined otherwise, all technical and scientific terms used in this Example generally have their ordinary meanings in the art, within the context of this Example and in the specific context where each term is used. The following references provide one of skill with a general definition of many of the terms used in this Example: Singleton et al., Dictionary of Microbiology and Molecular Biology (2nd ed. 1994); The Cambridge Dictionary of Science and Technology (Walker ed., 1988); The Glossary of Genetics, 5th Ed., R. Rieger et al. (eds.), Springer Verlag (1991); and Hale & Marham, The Harper Collins Dictionary of Biology (1991). Certain terms are discussed below, or elsewhere in the specification, to provide additional guidance to the practitioner in describing the compositions and methods of this Example and how to make and use them.

The terms “comprise(s),” “include(s),” “having,” “has,” “can,” “contain(s),” and variants thereof, as used herein, are intended to be open-ended transitional phrases, terms or words that do not preclude the possibility of additional acts or structures. The singular forms “a,” “an,” and “the” include plural references unless the context clearly dictates otherwise. The present Example also contemplates other embodiments “comprising,” “consisting of” and “consisting essentially of,” the embodiments or elements presented herein, whether explicitly set forth or not.

As used herein, the use of the word “a” or “an” when used in conjunction with the term “comprising” in the claims and/or the specification may mean “one,” but it is also consistent with the meaning of “one or more,” “at least one,” and “one or more than one.” Still further, the terms “having,” “including,” “containing,” and “comprising” are interchangeable, and one of skill in the art is cognizant that these terms are open ended terms.

The term “about” or “approximately” means within an acceptable error range for the particular value as determined by one of ordinary skill in the art, which will depend in part on how the value is measured or determined, i.e., the limitations of the measurement system. For example, “about” can mean within 3 or more than 3 standard deviations, per the practice in the art. Alternatively, “about” can mean a range of up to 20%, preferably up to 10%, more preferably up to 5%, and more preferably still up to 1% of a given value. Alternatively, particularly with respect to biological systems or processes, the term can mean within an order of magnitude, preferably within 5-fold, and more preferably within 2-fold, of a value.

In certain embodiments, the term “biomarker” refers to a marker (e.g., DNA methylation status) that allows detection of a disease and/or disorder in an individual, including detection of the disease or the disorder in its early stages. In certain embodiments, the term “biomarker” refers to a marker (e.g., DNA methylation status) that allows the characterization of a phenotype of a disease and/or a disorder in an individual. Early stage of a disease, as used herein, refers to the time period between the onset of the disease and the time point that signs or symptoms of the disease emerge. In certain non-limiting embodiments, the presence, absence, and/or level of a biomarker in a sample of a mammal (e.g., a human) is determined by comparing to a reference control.

The terms “reference sample,” “reference control,” “control,” or “reference,” as used interchangeably herein, refers to a control for a methylation status of a genomic locus that is to be detected in a sample of a mammal. In certain embodiments, a reference sample can be a sample from a healthy individual, e.g., an individual that does not have necrotizing enterocolitis. In certain embodiments, a reference sample can be a sample from a control individual that does not have the disease or phenotype to be detected by a biomarker disclosed herein. In certain embodiments, a control or reference can be the presence, absence, and/or a particular level of a methylation state of a genomic locus in a healthy individual. In certain embodiments, a reference can be a predetermined presence, absence, and/or particular level of a methylation state of a genomic locus that indicates a subject does not have necrotizing enterocolitis. In certain embodiments, a reference can be the methylation status of a locus in an individual having a disease or a phenotype, e.g., an individual that has necrotizing enterocolitis, where the methylation status of the locus is known to be not associated with the disease or the phenotype.

The term “a set of predicted values” refers to the methylation status of certain genomic loci for a sample. The status of those loci is not directly measured from that sample. Rather, it is inferred from measurements of other loci for that sample and/or measurements of other samples. The inference of the predicted values is based on mathematical/statistical models. The models are designed under the null hypothesis that the sample for which the methylation status of those loci is to be predicted has a normal phenotype.

The term “slightly invasive or non-invasive method” refers to a method that does not involve the removal of tissues by biopsy from the intestinal tract. In certain embodiments, slightly invasive or non-invasive methods, as described herein, include obtaining plasma or stool from a subject.

The term “patient” or “subject,” as used interchangeably herein, refers to any warm-blooded animal, e.g., human or non-human. Non-limiting examples of non-human subjects include mammals, non-human primates, dogs, cats, mice, rats, guinea pigs, rabbits, fowl, pigs, horses, cows, goats, sheep, etc. In certain embodiments, the subject is human.

The term “nucleic acid,” “nucleic acid molecule,” or “polynucleotide” includes any compound and/or substance that comprises a polymer of nucleotides. Each nucleotide is composed of a base, specifically a purine- or pyrimidine base (i.e., cytosine (C), guanine (G), adenine (A), thymine (T), or uracil (U)), a sugar (i.e., deoxyribose or ribose), and a phosphate group. In certain embodiments, the nucleic acid molecule is described by the sequence of bases, whereby said bases represent the primary structure (linear structure) of a nucleic acid molecule. The sequence of bases is typically represented from 5′ to 3′. These terms encompass deoxyribonucleic acid (DNA) including, e.g., complementary DNA (cDNA) and genomic DNA, ribonucleic acid (RNA), in particular messenger RNA (mRNA), synthetic forms of DNA or RNA, and mixed polymers comprising two or more of these molecules. The herein described nucleic acid molecule can contain naturally occurring or non-naturally occurring nucleotides. Examples of non-naturally occurring nucleotides include modified nucleotide bases with derivatized sugars or phosphate backbone linkages or chemically modified residues.

The term “isolated” (e.g., isolated genomic DNA) refers to a biological component that has been substantially separated or purified away from other biological components in the cell of the organism in which the component naturally occurs, e.g., other chromosomal and extra-chromosomal DNA and RNA, proteins, and organelles. Nucleic acids, e.g., DNA, that have been “isolated” include nucleic acids purified by standard purification methods.

The term “genomic locus” or “genomic DNA locus,” as used herein, refers to any fixed position in a genome. For example, a genomic locus can refer to a genomic element, a chromosomal region, a gene, a region of a gene, e.g., an exon or intron, a regulatory region of a gene, e.g., a promoter or enhancer, a CpG site, a CpG island, or a CpG island shore. For example, a genomic locus can include one or more CpG sites, e.g., between about 1 to about 100 CpG sites. In certain embodiments, a genomic locus can be of any particular length, e.g., between about 1 to about 10,000 nucleotides in length.

As used interchangeably herein, “methylation state,” “methylation profile,” “methylation status,” and “methylation level” refer to the presence, absence, percentage, and/or quantity of methylation at a particular nucleotide, or nucleotides, within a DNA region, e.g., a genomic locus. The methylation status of a particular DNA sequence (e.g., a genomic locus) can indicate the methylation state of every nucleotide in the sequence, indicate the methylation state of any of the nucleotides (e.g., cytosines) in the sequence, can indicate the methylation state of a subset of the nucleotides (e.g., of cytosines), can indicate the percentage or fraction of methylated cytosines at any particular stretch of nucleotides within the sequence or can indicate the average rate of methylation of all the cytosines (or a subset of the cytosines) present in a nucleic acid.

As used herein, a “methylated nucleic acid molecule” refers to a nucleic acid molecule that contains one or more methylated nucleotides that is/are methylated.

As used herein, a “CpG site” or “methylation site” is a nucleotide within a nucleic acid that is susceptible to methylation either by natural occurring events in vivo or by an event instituted to chemically methylate the nucleotide in vitro.

A “CpG island,” as used herein, describes a segment of a nucleic acid, e.g., DNA sequence, that have a high frequency of CpG dinucleotide repeats. See, e.g., Illingworth and Bird, FEBS Letters, 2009; 583:1713-1720. For example, Yamada et al. (Genome Research, 2004; 14:247-266) have described a set of standards for determining a CpG island: it must be at least 400 nucleotides in length, has a GC content greater than 50%, and an OCF/ECF ratio greater than 0.6. Others (Takai et al., Proc. Natl. Acad. Sci. U.S.A., 2002; 99:3740-3745) have defined a CpG island less stringently as a sequence of at least 200 nucleotides in length, having a greater than 50% GC content and an OCF/ECF ratio greater than 0.6.

A “CpG island shore,” as used herein, refers to methylation hotspots that are present a short distance, e.g., less than 2 kb, from CpG islands.

The term “methylome,” as used herein, refers to the amount or pattern of methylation at different sites or regions within a population of cells. The methylome can correspond to all of the genome, a subset of the genome (e.g., repeat elements in the genome), or a portion of the subset (e.g., those areas found to be associated with necrotizing enterocolitis). A methylome from plasma can be referred to a “plasma fluid methylome,” or a “plasma fluid DNA methylome.” The plasma fluid methylome is an example of a cell-free methylome that includes cell-free DNA (cfDNA). A methylome from stool can be referred to a “stool fluid methylome,” or a “stool fluid DNA methylome.” The stool fluid methylome is an example of a cell-free methylome that includes cell-free DNA (cfDNA).

As used herein, the term “increase” refers to alter positively by at least about 2%, including, but not limited to, alter positively by about 5%, by about 10%, by about 15%, by about 20%, by about 25%, by about 30%, by about 35%, by about 40%, by about 45%, by about 50%, by about 55%, by about 60%, by about 65%, by about 70%, by about 75%, by about 80%, by about 85%, by about 90%, by about 95% or by about 100%.

As used herein, the terms “reduce,” “reduction,” or “decrease” refers to alter negatively by at least about 2%, including, but not limited to, alter negatively by about 5%, by about 10%, by about 15%, by about 20%, by about 25%, by about 30%, by about 35%, by about 40%, by about 45%, by about 50%, by about 55%, by about 60%, by about 65%, by about 70%, by about 75%, by about 80%, by about 85%, by about 90%, by about 95% or by about 100%.

As described herein, this Example provides methods for diagnosing, monitoring, classifying, and/or treating necrotizing enterocolitis by analyzing the methylation status of one or more genomic loci in a sample (e.g., a plasma sample or a stool sample) of a mammal (e.g., a human infant). In certain embodiments, the methods can include using an algorithm described herein. In certain embodiments, the methods described herein can allow for the early diagnosis or screening of a subject with necrotizing enterocolitis, e.g., the subject does not have any symptoms, or only have early symptoms of necrotizing enterocolitis.

In certain embodiments, samples obtained for use in the methods described herein can include cfDNA, which carries DNA methylation information from the cell of origin. cfDNA can arise from cellular apoptosis and necrosis, and can be generated from active secretory processes, with the formation of extracellular vesicles. DNA signatures are highly tissue-specific, and include in vivo information relating to the tissue source of cfDNA. In certain embodiments, the methods described herein can include analyzing cfDNA in a sample (e.g., a plasma sample or a stool sample), to identify genetic phenotypes that are drivers and/or consequences of necrotizing enterocolitis.

The sample from the subject can be collected using any appropriate technique. For example, a blood sample, a plasma sample, or a stool sample can be collected using standard methods. In some cases, the sample can be collected from the subject before the subject has any symptom of necrotizing enterocolitis, i.e., a non-symptomatic subject. In certain embodiments, the sample can be collected from the non-symptomatic subject who is at high risk of necrotizing enterocolitis (e.g., a preterm baby). In certain embodiments, the sample can be collected from the subject who has previously received or is currently receiving a treatment for necrotizing enterocolitis. In certain embodiments, two or more samples (e.g., two or more, three or more, four or more, five or more, six or more or seven or more samples) can be obtained before and during the subject is receiving a treatment for necrotizing enterocolitis (e.g., serially obtained samples).

Diagnostic, Prognostic, Classification, and Monitoring Methods of this Example

This Example provides diagnostic and prognostic methods for diseases and/or disorders that are characterized by differential methylation of genomic loci. For example, this Example provides methods for diagnosing, prognosing, classifying, and/or monitoring necrotizing enterocolitis in a subject that includes analyzing the methylation status of certain genomic loci.

In certain embodiments, the analyzed genomic loci can include one or more genomic loci that exhibit differential methylation in a sample from a subject that has necrotizing enterocolitis compared to a reference sample. For example, the methods described herein can include assessing the methylation status of one or more genomic loci, e.g., about 5 or more, about 10 or more, about 50 or more, about 100 or more, about 500 or more, about 1,000 or more, about 5,000 or more, about 10,000 or more, about 25,000 or more, about 50,000 or more or about 100,000 or more genomic loci in a sample of a subject. In certain embodiments, the genomic loci can be selected from the genes, or a region within the genes, provided in Example Embodiment I. In certain embodiments, the one or more genomic loci can be one or more promoter regions of one or more genes, one or more exons of one or more genes, one or more introns of one or more genes, one or more CpG sites, one or more CpG islands, one or more CpG island shores, one or more enhancers of one or more genes, or a combination thereof. In certain embodiments, the genomic loci are present on a particular chromosome.

In certain embodiments, this Example provides methods for diagnosing, prognosing, and/or monitoring necrotizing enterocolitis in a subject by detecting the DNA methylation profiles associated with necrotizing enterocolitis. In certain embodiments, the methods described herein can include (a) obtaining a sample from the subject, (b) determining the methylation status of one or more genomic loci present in the sample, e.g., present within cfDNA in a plasma sample or a stool sample, (c) comparing the methylation status of the one or more genomic loci to a reference or a set of predicted values, and (d) diagnosing necrotizing enterocolitis in the subject. In certain embodiments, the difference in the methylation status of the one or more genomic loci in the sample compared to the reference or a set of predicted values indicates the presence of necrotizing enterocolitis in the subject. In certain embodiments, the difference in the methylation status also can indicate the severity of necrotizing enterocolitis.

In certain embodiments, the methods described herein for diagnosing, prognosing, and/or monitoring necrotizing enterocolitis in a subject can include (a) obtaining a sample from the subject, (b) determining the level of methylation of one or more genomic loci present in the sample, (c) comparing the level of methylation of the one or more genomic loci to a reference or a set of predicted values, and (d) diagnosing necrotizing enterocolitis in the subject. In certain embodiments, the difference in the level of methylation of the one or more genomic loci in the sample compared to the reference or a set of predicted values indicates the presence of necrotizing enterocolitis in the subject. In certain embodiments, the difference in the methylation level also can indicate the severity of necrotizing enterocolitis.

In certain embodiments, diagnosing necrotizing enterocolitis in the subject can include characterizing a phenotype of the necrotizing enterocolitis, wherein the difference in the methylation status of the one or more genomic loci in the sample compared to the reference or a set of predicted values indicates the phenotype of the necrotizing enterocolitis. In certain embodiment, the phenotype of the necrotizing enterocolitis can include the severity of the necrotizing enterocolitis, prognosis of the necrotizing enterocolitis, molecular expression profile of the necrotizing enterocolitis, responsiveness of the necrotizing enterocolitis to certain treatments, or any combinations thereof.

In certain embodiments, the methods described herein for determining if a subject is at risk of developing necrotizing enterocolitis in the subject can include (a) obtaining a sample from the subject, (b) determining the level of methylation of one or more genomic loci present in the sample, (c) comparing the level of methylation of the one or more genomic loci to a reference or a set of predicted values, and (d) determining that the subject is at risk of developing necrotizing enterocolitis, wherein the difference in the level of methylation of the one or more genomic loci in the sample compared to the reference or a set of predicted values indicates that the subject is at risk.

In certain embodiments, diagnosing, prognosing, and/or monitoring of a subject with necrotizing enterocolitis can be based on a higher or lower methylation level of the genomic locus in the sample of the subject relative to the methylation level in a reference sample, e.g., a sample from a subject that does not have necrotizing enterocolitis. In certain embodiments, a difference of greater than about 5%, greater than about 10%, greater than about 15%, greater than about 20%, greater than about 25%, greater than about 30%, greater than about 35%, greater than about 40%, greater than about 45%, greater than about 50%, greater than about 55%, greater than about 60%, greater than about 65%, greater than about 70%, greater than about 75%, greater than about 80%, greater than about 85%, greater than about 90% or greater than about 95% in the methylation (e.g., level, percentage and/or fraction) of the one or more genomic loci in a sample obtained from a subject compared to a control can be indicative that the subject has necrotizing enterocolitis or is at risk of developing necrotizing enterocolitis. In certain embodiments, the difference can be a decrease in methylation (e.g., level, percentage, and/or fraction) of the genomic loci in the sample of the subject. Alternatively, the difference can be an increase in methylation (e.g., level, percentage, and/or fraction) of the genomic loci in the sample of the subject. In certain embodiments, the difference can be a decrease in the methylation of a genomic locus and an increase in the methylation of a different genomic locus in the sample obtained from the subject. In certain embodiments, a decrease in the level of methylation of one or more genomic loci in the sample and the increase in the level of methylation of one or more different genomic loci in the sample can indicate the presence of necrotizing enterocolitis.

In certain embodiments, diagnosis of a subject with necrotizing enterocolitis can be based on the methylated or unmethylated state of a genomic locus, e.g., a CpG site. In certain embodiments, a genomic locus, e.g., a CpG site, in a sample from a subject diagnosed with necrotizing enterocolitis can be methylated and the genomic locus, e.g., the CpG site, in a reference sample can be unmethylated. In certain embodiments, a genomic locus in a sample from a subject diagnosed with necrotizing enterocolitis can be unmethylated and the genomic locus in a reference sample can be methylated.

Diagnostic, Prognostic, Classification, and Monitoring Methods Using an Algorithm of this Example

This Example also provides diagnostic and prognostic methods for diseases and/or disorders that are characterized by differential methylation of genomic loci by using an algorithm as described, for example, in Example Embodiment J. For example, this Example provides methods for diagnosing, prognosing, classifying, and/or monitoring necrotizing enterocolitis in a subject that includes analyzing the methylation status of certain genomic loci and/or genomic fractions.

Methods for Treating Necrotizing Enterocolitis

This Example also provides methods for treating a subject having necrotizing enterocolitis. For example, a mammal (e.g., a human infant) that was identified as having necrotizing enterocolitis as described herein (or identified as being at risk of developing necrotizing enterocolitis as described herein) can be administered one or more antibiotic therapies, can be fed intravenously as opposed to feeding by mouth, or a combination thereof to treat necrotizing enterocolitis. Examples of antibiotic therapies that can be used as described herein include, without limitation, ampicillin, gentamycin, vancomycin, cefepime, and metronidazole. In some cases, a mammal (e.g., a human infant) that was identified as having necrotizing enterocolitis as described herein (or identified as being at risk of developing necrotizing enterocolitis as described herein) can be treated using a surgical procedure to treat necrotizing enterocolitis. Examples of surgical procedures that can be used as described herein include, without limitation, exploratory laparotomy, resection of affected intestine and creation of stoma and reanastamosis.

In some cases, the information provided by the methods described herein can be used by a clinician or physician in determining the most effective course of treatment (e.g., preventative or therapeutic) for the subject. A course of treatment refers to the measures taken for a patient after the prognosis or the assessment of increased risk for development of necrotizing enterocolitis is made. For example, when a subject is identified to have an increased risk of developing necrotizing enterocolitis, the physician can determine whether frequent monitoring of DNA methylation changes can be performed as a prophylactic measure. Also, when a subject is diagnosed with necrotizing enterocolitis (e.g., based on the presence of a DNA methylation pattern in a sample from a subject), it can be advantageous to follow such detection with a therapeutic treatment.

In some cases, this Example provides methods for assessing the efficacy of a therapeutic or prophylactic therapy for treating necrotizing enterocolitis in a subject, comprising determining the methylation status of one or more genomic loci present in a sample obtained from a subject prior to the therapy and determining methylation status of the one or more genomic loci present in a sample obtained from the subject at one or more time points during the therapeutic or prophylactic therapy, wherein the therapy is efficacious for treating necrotizing enterocolitis in a subject when there is a change in the presence and/or level of methylation of the one or more genomic loci in the second or subsequent samples, relative to the first sample. In certain embodiments, the first sample is obtained after therapeutic treatment has begun.

In certain embodiments, the methods for monitoring the response in a subject to prophylactic or therapeutic treatment of necrotizing enterocolitis can include measuring the methylation status and/or level of one or more genomic loci in a sample of a subject at a first time-point, administering a therapeutic agent, re-measuring the methylation status and/or level of the one or more genomic loci at a second time-point, comparing the results of the first and second measurements and optionally modifying the treatment regimen based on the comparison. In certain embodiments, the first time-point can be prior to an administration of the therapeutic agent, and the second time-point can be after said administration of the therapeutic agent. In certain embodiments, the first time-point can be prior to the administration of the therapeutic agent to the subject for the first time. In certain embodiments, the dose (defined as the quantity of therapeutic agent administered at any one administration) can be increased or decreased in response to the comparison. In certain embodiments, the dosing interval (defined as the time between successive administrations) can be increased or decreased in response to the comparison, including total discontinuation of treatment. In addition, the methods described herein can be used to determine the efficacy of the therapeutic treatment, wherein a change in the methylation status of certain genomic loci present in a sample of a subject can indicate that the therapeutic treatment regimen can be altered, reduced, and/or stopped.

Assays of this Example

This Example also provides assays and/or methods for determining the DNA methylation status and/or level of genomic loci that correlates with the presence, absence, and/or severity of necrotizing enterocolitis. In some cases, the assay method can include comparing the methylation status and/or level of genomic loci present in a sample from a subject that has necrotizing enterocolitis to the methylation status and/or level of genomic loci in a sample from a healthy subject to determine the methylation pattern, as described above, that correlates with the presence of necrotizing enterocolitis. In some cases, the assay methods can include comparing the methylation status and/or level of genomic loci in a sample from a subject that has necrotizing enterocolitis at an early stage to the methylation status and/or level of genomic loci in a sample from a subject that has necrotizing enterocolitis at a late stage to determine the methylation status and/or level that correlates with the different stages and/or severity of necrotizing enterocolitis.

DNA Isolation Techniques of this Example

In certain embodiments, the methods described herein can include isolating nucleic acid from a sample (e.g., a plasma sample or a stool sample) obtained from a subject. Any appropriate technique can be used to isolate nucleic acids from a sample. For example, isolation of DNA from a plasma sample can be performed by extraction methods using organic solvents such as a mixture of phenol and chloroform, followed by precipitation with ethanol (see, for example, J. Sambrook et al., “Molecular Cloning: A Laboratory Manual,” 1989, 2nd Ed., Cold Spring Harbor Laboratory Press: New York, N.Y.). Additional non-limiting examples include salting out DNA extraction (see, for example, P. Sunnucks et al., Genetics, 1996, 144:747-756; and S. M. Aljanabi and I. Martinez, Nucl. Acids Res. 1997, 25:4692-4693), the trimethylammonium bromide salts DNA extraction method (see, for example, S. Gustincich et al., BioTechniques, 1991, 11:298-302), and the guanidinium thiocyanate DNA extraction method (see, for example, J. B. W. Hammond et al., Biochemistry, 1996, 240:298-300). There are also numerous commercially available kits that can be used to extract DNA from biological fluids (e.g., plasma samples) or cells. For example, Qiagen's Gentra PureGene Cell Kit, QlAamp Circulating Nucleic Acid Kit, QiaAmp DNA Mini Kit, DNeasy Blood and Tissue Kit or QiaAmp DNA Blood Mini Kit (Qiagen, Hilden, Germany), GenomicPrep™ Blood DNA Isolation Kit (Promega, Madison, Wis.) and GFX™ Genomic Blood DNA Purification Kit (Amersham, Piscataway, N.J.) can be used to obtain DNA from a sample from a subject.

Methylation Detection Techniques of this Example

Various methylation analysis procedures are known in the art, and can be used with the methods described herein. These assays allow for determination of the methylation state of one genomic locus, e.g., one or more CpG sites or islands within a nucleic acid obtained from a sample. In addition, the methods can be used to quantify the methylation of a genomic locus. Such assays involve, among other techniques, DNA sequencing of bisulfite-treated DNA, PCR (for sequence-specific amplification), digital PCR and use of methylation-sensitive restriction enzymes.

In certain embodiments, methylation-specific PCR can be used to determine the methylation status of a genomic loci. Methylation-specific PCR is based on a chemical reaction of sodium bisulfite with DNA that converts unmethylated cytosines, e.g., of CpG dinucleotides, to uracil or UpG, followed by traditional PCR. Methylated cytosines will not be converted in this process, and primers can be designed to overlap the methylation site, e.g., CpG site, of interest, thereby allowing one to determine the methylation status of the methylation site as methylated or unmethylated. Additionally, restriction enzyme digestion of PCR products amplified from bisulfite-converted DNA may be used, e.g., by using the method described by Sadri & Hornsby (Nucl. Acids Res. 1996; 24:5058-5059) or COBRA (Combined Bisulfite Restriction Analysis) (Xiong & Laird, Nucleic Acids Res. 1997; 25:2532-2534).

In certain embodiments, whole genome bisulfite sequencing, which is a high-throughput genome-wide analysis of DNA methylation, can be used to determine the methylation status of multiple genomic loci. It is based on sodium bisulfite conversion of genomic DNA, as described above, which is then sequenced on a next-generation sequencing platform. The sequences obtained are then re-aligned to the reference genome to determine the methylation states of cytosines, e.g., of CpG dinucleotides, present within the analyzed genomic loci based on mismatches resulting from the conversion of unmethylated cytosines into uracil.

In certain embodiments, genome-wide DNA methylation profiling can be performed using commercially-available arrays, thereby allowing the interrogation of multiple genomic loci, e.g., multiple CpG sites. Non-limiting examples of such arrays include HumanMethylation BeadChips (Illumina, San Diego, Calif.) and Infinium MethylationEPIC kit (Illumina). Additional methods for analyzing the methylation state of multiple genomic loci are provided in Yong et al., Epigenetics & Chromatin 2016; 9:26, which is incorporated by reference herein.

Kits of this Example

This Example provides kits for diagnosing, monitoring, classifying, and/or treating a subject with necrotizing enterocolitis. The kits described herein can comprise a means for determining and/or detecting the methylation status of one or more genomic loci.

Kits described herein can include, without limitation, packaged probe and primer sets (e.g., TaqMan probe/primer sets), arrays/microarrays, which further contain one or more probes, primers, or other detection reagents for determining the methylation state and/or level of one or more genomic loci. For example, a kit described herein can include one or more probes or primers for detecting the methylation state of one or more genomic loci. In certain embodiments, the one or more genomic loci comprise a CpG site. In certain embodiments, one or more of the genomic loci do not comprise a CpG site. For example, about 5% or more, about 10% or more, about 15% or more, about 20% or more, about 25% or more, about 30% or more, about 35% or more, about 40% or more, about 45% or more, about 50% or more, about 55% or more, about 60% or more, about 65% or more or about 70% or more of the one or more genomic loci detected by the primers or probes can comprise one or more CpG sites.

In certain non-limiting embodiments, a primer and/or probe described herein can be at least about 10 nucleotides or at least about 15 nucleotides or at least about 20 nucleotides in length, and/or up to about 200 nucleotides or up to about 150 nucleotides or up to about 100 nucleotides or up to about 75 nucleotides or up to about 50 nucleotides in length.

In a further non-limiting embodiment, the oligonucleotide primers and/or probes can be immobilized on a solid surface or support, for example, on a nucleic acid microarray, wherein the position of each oligonucleotide primer and/or probe bound to the solid surface or support is known and identifiable.

In certain non-limiting embodiments, the kits described herein can additionally include other components such as a buffer, enzymes such as DNA polymerases or ligases, nucleotides such as deoxynucleotide triphosphates, positive control sequences, and/or negative control sequences necessary to carry out an assay or reaction to detect the methylation state of a genomic locus.

In certain embodiments, the kits described herein can include a container comprising one or more probes and/or primers for detecting the methylation state of one or more genomic loci. In certain embodiments, the kits further include instructions for use, e.g., the instructions can describe that a particular methylation status of a genomic locus is indicative of necrotizing enterocolitis in a subject. The instructions can be printed directly on the container (when present), or as a label applied to the container, or as a separate sheet, pamphlet, card or folder supplied in or with the container.

Reports, Programmed Computers, and Systems of this Example

In certain embodiments, a diagnosis and/or monitoring of necrotizing enterocolitis in a subject based on the methylation status of one or more genomic loci as described herein can be referred to herein as a “report.” A tangible report can optionally be generated as part of a testing process (which can be interchangeably referred to herein as “reporting,” or as “providing” a report, “producing” a report or “generating” a report).

Examples of tangible reports can include, without limitation, reports in paper (such as computer-generated printouts of test results) or equivalent formats and reports stored on computer readable medium (such as a CD, USB flash drive or other removable storage device, computer hard drive, or computer network server, etc.). Reports, particularly those stored on computer readable medium, can be part of a database, which can optionally be accessible via the internet (such as a database of patient records or genetic information stored on a computer network server, which can be a “secure database” that has security features that limit access to the report, such as to allow only the patient and the patient's medical practitioners to view the report while preventing other unauthorized individuals from viewing the report, for example). In addition to, or as an alternative to, generating a tangible report, reports can also be displayed on a computer screen (or the display of another electronic device or instrument).

A report can include, for example, an individual's medical history, or can just include size, presence, absence, or levels of one or more markers (for example, a report on computer readable medium such as a network server can include hyperlink(s) to one or more journal publications or websites that describe the medical/biological implications). Thus, for example, the report can include information of medical/biological significance as well as optionally also including information regarding the methylation status of relevant genomic loci, or the report can just include information regarding the methylation status of relevant genomic loci without other medical/biological significance.

A report can further be “transmitted” or “communicated” (these terms can be used herein interchangeably), such as to the individual who was tested, a medical practitioner (e.g., a doctor, nurse, clinical laboratory practitioner, genetic counselor, etc.), a healthcare organization, a clinical laboratory, and/or any other party or requester intended to view or possess the report. The act of “transmitting” or “communicating” a report can be by any means known in the art, based on the format of the report. Furthermore, “transmitting” or “communicating” a report can include delivering a report (“pushing”) and/or retrieving (“pulling”) a report. For example, reports can be transmitted/communicated by various means, including being physically transferred between parties (such as for reports in paper format) such as by being physically delivered from one party to another, or by being transmitted electronically or in signal form (e.g., via e-mail or over the internet, by facsimile, and/or by any wired or wireless communication methods known in the art) such as by being retrieved from a database stored on a computer network server, etc.

In certain embodiments, the disclosed subject matter provides computers (or other apparatus/devices such as biomedical devices or laboratory instrumentation) programmed to carry out the methods described herein, e.g., to perform the algorithm disclosed in this Example (see Example Embodiment J). In certain embodiments, the system can be controlled by the individual and/or their medical practitioner in that the individual and/or their medical practitioner requests the test, receives the test results back and (optionally) acts on the test results to reduce the individual's necrotizing enterocolitis risk or treat the individual, such as by implementing an necrotizing enterocolitis management system.

Example Embodiment I of Example 5: Whole Genome Bisulfite Sequencing of Laser-Captured Enterocytes from NEC-Affected Colon and Ileum

This Example Embodiment I identifies DNA methylation differences in enterocyte genomic DNA that may serve as biomarkers for NEC and also provides new insights into disease pathogenesis as well as tissue origin (colon or ileum). These experiments confirm that completely unbiased analysis of DNA methylation in an enterocyte cell-type-specific fashion can be performed. The goal of these studies was to differentiate patients with NEC from controls, thereby identifying putative diagnostic biomarkers that may be present in stool.

It is challenging to perform laser capture microdissection (LCM) followed by whole genome bisulfite sequencing (WGBS) in limited tissue samples. However, FIGS. 40 and 41 demonstrate that robust data can be generated using this approach. Specifically, a total of n=40 bisulfite sequencing libraries were prepared from individual tissue samples following laser capture microdissection of enterocytes. These consisted of n=12 from normal colon, n=10 from NEC colon, n=11 from normal ileum and n=7 from NEC ileum. There are differences that exist in the cellular make-up of the ileum and colon. NEC occurs commonly in the distal ileum, but also affects the colon, so the following was performed to determine whether differences exist in these cell types. FIG. 40 shows images of neonatal gut epithelial cells before and after LCM. The resulting DNA samples were subjected to WGBS in which a total, across all samples, of aligned read pairs 5,162,557,910 were sequenced. Sequencing data were aligned to the GRCh38 reference genome. As shown in FIG. 41, unsupervised clustering of the resulting WGBS data identified numerous DNA methylation changes associated with NEC when compared with control samples.

Note that, control tissue samples were in fact healed NEC tissue, obtained during surgical re-anastomosis. These samples were the only readily available source of human tissue for this study design, and the practice of using these controls is appropriate in the NEC field. When evaluating premature intestine, it is noted that the average premature infant is not having a bowel resection or a routine endoscopy as in older children/adults. Control bowel is never resected from a healthy infant, and it is very rare that surgeons will take non-necrotic margins from patients with NEC as they want to preserve every centimeter of bowel as possible. Bowel can be resected for non-inflamed conditions such as patients intestinal atresia, but to have those patients aged matched with an infant with NEC is nearly impossible and as those patients usually deliver at term and further, would require multi-center funding to collect these samples.

NEC samples were obtained across a corrected gestational span of approximately 5 weeks. Given the challenges associated with consenting subjects and obtaining samples, samples were combined for analysis into case versus control and corrected gestational age variables were ignore. Furthermore, because of the manner in which individuals had, prior to the beginning of this study, been consented and samples obtained, complete information regarding neonatal sex for all samples was not obtained. The focus, therefore, was only on autosomal data to avoid the complication of X-inactivation and minimize the impact of sex differences.

Analysis of the data identified differentially-methylated CpG sites that differ between NEC and control colon and NEC and control ileum. These are summarized in FIGS. 42A and 42B, respectively. 5352 genomic loci were identified on automsomes containing gene bodies (introns/exons) or promoter regions with a difference in average methylation rate of at least 0.1 between NEC and control colon samples. This compared with 2785 genomic loci on automsomes containing gene bodies or promoter regions with a difference in average methylation rate of at least 0.1 between NEC and ileum colon samples. The focus was on autosomal markers to avoid potentially confounding influences of neonatal sex caused by chromosome X-inactivation and, by including males and females, the number of samples available for comparison between NEC and control tissues was maximized. Differentially methylated single CpG sites were identified, and there were 38,843 of these between NEC and control colon with an adjusted p value (q value) of <0.05 (p value <0.00043) found. Notably, there were far fewer of these (n=652) at the same level of significance (q=0.05) when comparing NEC and control ileum.

Further analysis identified the top 30 CpG sites, determined by LCM-coupled WGBS, with the most highly significant differences (q value) between NEC and control colon and the top 30 CpG sites with the most highly significant differences between NEC and control ileum. These are shown in Table 12A and Table 12B. The lists of CpG sites represent examples of specific DNA methylation differences that may be detected in the relevant biological samples listed herein, demonstrating that DNA methylation differences such as those listed and many others exist and can be used to differentiate samples as described herein, for example, as described using procedures similar to those set forth in Example Embodiment J.

TABLE 12A The top 30 CpG sites, determined by LCM-coupled WGBS, with the most highly significant differences (q value) between NEC and control colon. meth.diff chr coordinate pvalue qvalue (NEC-Control) chr17 11019015 1.20E−21 8.39E−15 40.2966432 chr2 86062716 2.67E−17 9.34E−11 32.2328187 chr4 115491856 5.48E−16 1.28E−09 40.7954545 chr14 52272903 1.95E−15 3.41E−09 72.8461538 chr10 21066470 1.40E−14 1.32E−08 35.8379475 chr3 175224788 1.33E−14 1.32E−08 37.8378378 chr5 24263140 1.51E−14 1.32E−08 56.26401 chr4 58967547 2.02E−14 1.57E−08 38.961039 chr2 79966760 3.05E−14 2.14E−08 55.4058265 chr4 22556420 3.43E−14 2.18E−08 45.3261808 chr5 17911285 3.99E−14 2.33E−08 33.6363636 chr2 119409733 4.60E−14 2.47E−08 53.9423547 chr20 38460167 9.79E−14 4.89E−08 42.204006 chr3 101804213 1.31E−13 5.91E−08 51.4418811 chr7 67694175 1.35E−13 5.91E−08 64.7450111 chr6 121426895 1.44E−13 5.94E−08 63.5005066 chr12 101416276 1.81E−13 7.02E−08 42.526497 chr6 78594584 2.13E−13 7.85E−08 50.2966259 chr11 12359400 2.47E−13 8.05E−08 38.5416667 chr6 81046717 2.53E−13 8.05E−08 31.9619471 chr15 63678136 2.37E−13 8.05E−08 55.5917004 chr1 167097455 3.50E−13 1.02E−07 66.4761905 chr15 81380112 4.30E−13 1.20E−07 44.6670538 chr2 150152356 4.63E−13 1.25E−07 56.978022 chr6 167440526 6.28E−13 1.47E−07 47.9513444 chr16 63305874 7.40E−13 1.67E−07 40.3446227 chr10 99591101 8.02E−13 1.70E−07 60.5306799 chr5 22585518 7.92E−13 1.70E−07 50.7375776 chr8 61783188 8.72E−13 1.80E−07 58.4869432 chr19 53612831 1.15E−12 2.24E−07 51.3878279

TABLE 12B The top 30 CpG sites, determined by LCM-coupled WGBS, with the most highly significant differences (q value) between NEC and control ileum. chr coordinate pvalue qvalue meth.diff chr2 70827641 2.11E−23 1.85E−16 −95.240642 chr1 100785163 4.60E−21 2.02E−14 −52.534562 chr1 100762620 2.21E−17 6.47E−11 −59.166667 chr1 143641173 6.27E−17 1.38E−10 34.0425532 chr6 73558525 9.22E−17 1.62E−10 84.6153846 chr11 10964709 5.72E−16 8.36E−10 39.4366197 chr17 53706550 4.09E−15 5.13E−09 −71.428571 chr1 100785162 1.64E−14 1.80E−08 −47.682119 chr10 53230713 2.29E−13 2.23E−07 36 chr8 27077249 3.38E−13 2.96E−07 94.1176471 chr5 36566389 5.11E−13 4.08E−07 31.1688312 chr5 60868950 6.00E−13 4.18E−07 −72.093023 chr7 29003503 6.20E−13 4.18E−07 35.7142857 chr1 100768520 1.11E−12 6.93E−07 −55.769231 chr11 47812794 1.21E−12 7.05E−07 −86.486486 chr4 147431181 1.47E−12 8.06E−07 −95.238095 chr1 67009294 1.81E−12 9.33E−07 −57.823129 chr2 45853090 2.60E−12 1.27E−06 94.4444444 chr13 33070399 3.26E−12 1.51E−06 −70.731707 chr14 68071519 4.29E−12 1.88E−06 −80.090909 chr13 25087158 4.95E−12 2.07E−06 −88.888889 chr1 100750518 9.64E−12 3.68E−06 −58.558559 chr14 22548745 9.48E−12 3.68E−06 73.9130435 chr19 7737210 1.05E−11 3.84E−06 −41.052632 chr2 54973902 1.29E−11 4.52E−06 30.6451613 chr5 61059457 1.71E−11 5.78E−06 −59.701493 chr1 123215253 1.80E−11 5.86E−06 35.5555556 chr5 116099451 1.91E−11 5.86E−06 −88.333333 chr6 33559051 1.94E−11 5.86E−06 −76.086957 chr3 78587902 2.03E−11 5.94E−06 −85.714286

Targeted Genome-Wide Analysis of NEC-Specific DNA Methylation in Tissue Sections from Neonatal Ileum and Colon

Targeted genome-wide bisulfite DNA sequencing from histopathological sections obtained from the tips of the villi or the crypts during NEC and control neonatal gut samples (both colon and ileum) was performed. This method delivered higher sequencing read depth per-dollar than WGBS in well-characterized regions of the human genome including promoters, exons, introns, CpG islands, CpG island shores and enhancers. This included about 5.5 million CpG sites. This approach quantified DNA methylation levels within genes involved in known biological pathways.

Specifically, targeted genome-wide bisulfite sequencing was carried out on DNA extracted from histological tissue sections obtained from n=8 NEC colon, n=13 control colon, n=5 NEC ileum and n=9 control ileum samples. Bisulfite-converted DNA libraries underwent targeted capture using the SeqCap Epi CpGiant Enrichment Kit (Roche, Pleasanton, Calif.) and sequenced to a read depth of ˜50×. The Bismark Bisulfite Read Mapper was used to align the libraries against the GRCh38 reference genome and determine the methylation status for each CpG site. DNA methylation signatures were identified using the beta-binomial test implemented in the R packages methylSig and DSS. A total, across all samples, of aligned read pairs 3,126,811,295 was sequenced.

Note that, control tissue samples were in fact healed NEC tissue, obtained during surgical re-anastomosis. These samples were the only readily available source of human tissue for this study design, and the practice of using these controls is appropriate in the NEC field. NEC samples were obtained across a corrected gestational span of approximately 5 weeks. Given the challenges associated with consenting subjects and obtaining samples, samples were combined for analysis into case versus control, and corrected gestational age variables were ignore. Furthermore, because of the manner in which individuals had, prior to the beginning of this study, been consented and samples obtained, complete information regarding neonatal sex for all samples was not obtained. The focus, therefore, was only on autosomal data to avoid the complication of X-inactivation and minimize the impact of sex differences.

As shown in FIG. 43, unsupervised clustering of the resulting WGBS data identified numerous DNA methylation changes associated with NEC when compared with control samples. Numerous differentially-methylated CpG sites that differ between NEC and control colon and NEC and control ileum were identified. Specifically, 1144 genomic loci containing gene bodies (introns/exons) or promoter regions with a difference in average methylation rate of at least 0.1 between NEC and control colon samples were identified. This compared with 1388 genomic loci containing gene bodies or promoter regions with a difference in average methylation rate of at least 0.1 between NEC and ileum control samples. Differentially methylated single CpG sites were identified, and 327 of these between NEC and control colon with an adjusted p value (q value) of <0.05 (p value <0.00043) were found. Notably, there far fewer of these (n=282) at the same level of significance (q=0.05) when comparing NEC and control ileum.

The top 30 CpG sites, determined via targeted genome-wide analysis, with the most highly significant differences (q value) between NEC and control colon and the top 30 CpG sites with the most highly significant differences (q value) between NEC and control ileum were identified. These are shown in Table 13A and Table 13B. Again, the lists of CpG sites represent examples of specific DNA methylation differences that may be detected in the relevant biological samples listed herein, demonstrating that DNA methylation differences such as those listed and many others exist and can be used to differentiate samples as described herein, for example, as described using procedures similar to those set forth in Example Embodiment J.

TABLE 13A The top 30 CpG sites, determined by genome-wide targeted bisulfite sequencing of histological tissue sections, with the most highly significant differences (q value) between NEC and control colon. chr coordinate pvalue qvalue meth.diff chr2 218381984 2.46E−12 2.19E−06 −35.326767 chr19 50604303 2.71E−11 1.45E−05 55.0837283 chr7 75985405 2.98E−11 1.45E−05 44.1860465 chr6 224755 9.29E−11 3.31E−05 −33.937735 chr16 85544194 2.51E−10 5.98E−05 −33.923694 chr2 156328627 2.64E−10 5.98E−05 −36.422369 chr15 25176679 3.75E−10 6.26E−05 −35.126778 chr3 183211486 1.34E−09 0.00015263 −34.514436 chr19 50604306 2.52E−09 0.00024029 49.2693651 chr11 76588776 2.79E−09 0.00025528 −37.664924 chr17 79086422 3.82E−09 0.00029847 −30.068785 chr1 226952071 5.07E−09 0.0003385 −32.523562 chr12 34310055 5.14E−09 0.0003385 −34.253185 chr5 157458062 5.01E−09 0.0003385 −40.56775 chr17 48156384 5.63E−09 0.00036127 −32.469752 chr19 50604335 6.15E−09 0.00038688 30.3593704 chr1 36913980 7.14E−09 0.00042878 43.6803937 chr4 87863958 7.14E−09 0.00042878 −36.179812 chr19 50604301 7.53E−09 0.00043353 54.0622169 chr4 87864026 9.08E−09 0.00048062 −35.344262 chr5 42924363 1.02E−08 0.00050657 35.5612968 chr1 1763942 1.27E−08 0.00057572 −30.307887 chr8 41623158 1.27E−08 0.00057572 33.8067593 chr17 4145468 1.66E−08 0.0006958 58.0597015 chr7 37389812 1.65E−08 0.0006958 −35.284934 chr19 941117 1.72E−08 0.00071385 44.6992481 chr11 1486034 1.83E−08 0.00073694 32.6567192 chr10 61869861 2.36E−08 0.00087391 31.7641412 chr8 124994839 2.61E−08 0.00093255 −30.431059 chr17 4145498 2.82E−08 0.00094855 51.9661017

TABLE 13B The top 30 CpG sites, determined by genome-wide targeted bisulfite sequencing of histological tissue sections, with the most highly significant differences (q value) between NEC and control ileum. chr coordinate pvalue qvalue meth.diff chr5 36782610 1.46E−16 8.31E−10 75.8134902 chr9 136767037 2.02E−11 5.74E−05 37.8666667 chr5 36782611 9.68E−11 0.00011858 67.5714472 chr1 1759026 1.25E−10 0.00011858 76.0162602 chr16 23839083 1.16E−10 0.00011858 33.5234386 chr16 87999766 1.61E−10 0.00012579 74.1666667 chr6 106098949 2.69E−10 0.00016954 31.5360412 chr5 133047721 3.46E−10 0.00019645 −46.346154 chr14 22237077 8.18E−10 0.00033552 72.2095238 chr11 1238921 8.27E−10 0.00033552 36.6792929 chr17 8475916 7.38E−10 0.00033552 −34.309441 chr19 2248684 1.77E−09 0.00058746 −50 chr6 170110578 2.26E−09 0.00067503 52.8765352 chr8 41665339 4.23E−09 0.0011452 35.942029 chr5 36783444 7.87E−09 0.00178832 56.4516129 chr12 51392880 7.58E−09 0.00178832 40.3846154 chr5 36783343 1.04E−08 0.00228362 69.2049272 chr6 135722823 1.60E−08 0.00324704 36.3983488 chr10 90319722 1.68E−08 0.00328501 43.5909091 chr18 79225913 1.75E−08 0.00331339 70.713073 chr20 50891154 1.97E−08 0.00344344 51.6445352 chr5 16539524 1.88E−08 0.00344344 51.6109422 chr1 6078881 2.43E−08 0.00373276 69.2307692 chr2 218273094 2.37E−08 0.00373276 41.7998198 chr15 84235722 2.87E−08 0.00388368 36.9852941 chr15 41501738 2.65E−08 0.00388368 −30.643539 chr17 59826115 3.83E−08 0.00483427 39.0368981 chr1 5077162 4.31E−08 0.00519313 31.9605263 chr10 126392102 4.84E−08 0.00519313 46.9061462 chr2 39245224 4.57E−08 0.00519313 −42.878788

Targeted Genome-Wide Analysis of DNA Methylation in Whole Blood from NEC Cases and Controls

Blood samples from NEC patients (n=6) were compared with blood samples from control premature infants (n=6). A further samples from NEC cases (n=9) and from controls (n=7) were sequenced, and data is analyzed. These samples did not fully overlap with tissue samples analyzed. This was partly because of the availability of the tissue and blood samples and partly because of the challenges associated with gathering data from laser captured tissue, which meant that some tissue samples could not be assayed effectively. Data presented herein was from NEC samples that were surgical only. The goal was to identify DNA methylation changes in blood samples from affected individuals and advance the understanding of epigenomic dysregulation in the circulating hematopoietic cell compartment during NEC. As before, the SeqCap Epi CpGiant Enrichment Kit and related protocols as described above were used to generate an average of 42× read depth coverage across ˜80.5 Mb of the human genome. A total, across all samples, of aligned read pairs 1,671,368,776 was sequenced.

Analysis of the data identified differentially-methylated CpG sites that differ between NEC and control blood samples. These are summarized in FIG. 44. 1071 genomic loci containing gene bodies or promoter regions with a difference in average methylation rate of at least 0.1 between NEC and control blood samples were identified. Differentially methylated single CpG sites were identified, and 1,870 of these between NEC and control blood with an adjusted p value (q value) of <0.05 (p value <7.48×10⁻⁵) were found. Further analysis identified the top 30 CpG sites, determined by LCM-coupled WGBS, with the most highly significant differences (q value) between NEC and control blood samples. These are shown in Table 14. Again, the lists of CpG sites represent examples of specific DNA methylation differences that may be detected in the relevant biological samples listed herein, demonstrating that DNA methylation differences such as those listed and many others exist and can be used to differentiate samples as described herein, for example, as described using procedures similar to those set forth in Example Embodiment J.

TABLE 14 The top 30 CpG sites, determined by genome-wide targeted bisulfite sequencing of neonatal whole blood samples, with the most highly significant differences (q value) between NEC and control neonates. chr coordinate pvalue qvalue meth.diff chr14 96964828 8.37E−48 5.72E−41 −49.454564 chr14 96964827 1.25E−40 4.28E−34 −50.27117 chr8 142252127 9.27E−29 2.11E−22 −43.839856 chr18 673016 1.84E−25 3.15E−19 −48.047898 chr12 66884475 1.03E−23 1.41E−17 −49.206349 chr18 673017 1.72E−21 1.95E−15 −48.640963 chr19 6004094 2.42E−20 2.36E−14 40.8653846 chr12 20777963 5.47E−19 4.68E−13 −50 chr1 2153138 1.92E−18 1.45E−12 35.1955307 chr7 832620 3.60E−18 2.46E−12 32.9454859 chr18 35271680 6.99E−18 4.34E−12 46.5648855 chr4 165278238 1.54E−17 8.77E−12 −43.776491 chr5 57878551 1.94E−17 1.02E−11 34.1562161 chr1 223230217 5.11E−17 2.49E−11 35.7997186 chr1 223230191 8.28E−16 2.98E−10 33.6101736 chr7 51477426 7.43E−16 2.98E−10 50.3582517 chr4 165278239 8.78E−16 3.00E−10 −42.763623 chr1 97938476 2.45E−15 7.98E−10 −30.130003 chr18 35271887 6.70E−15 2.08E−09 50.5058426 chr5 10377076 3.27E−14 8.01E−09 33.8299617 chr2 144414421 3.28E−14 8.01E−09 −41.004454 chr1 223230234 3.45E−14 8.14E−09 36.5955035 chr2 151292174 8.76E−14 1.84E−08 −31.327801 chr17 768851 8.89E−14 1.84E−08 31.2066503 chr2 43040642 1.04E−13 2.10E−08 −31.634519 chr11 64870691 1.22E−13 2.28E−08 −30.076286 chr2 21228815 1.62E−13 2.84E−08 −38.551119 chr20 31473908 1.81E−13 2.94E−08 −41.565217 chr2 47101189 1.75E−13 2.94E−08 77.1785268 chr11 23021922 2.75E−13 4.28E−08 74.2857143

Targeted Genome-Wide Analysis of DNA Methylation in Stool Samples from NEC Cases and Controls

A discovery-based analysis of DNA methylation in stool samples from NEC affected neonates and controls was performed. The data was gathered from four cases and four controls. The SeqCap Epi CpGiant Enrichment Kit and related protocols as described above were used to generate bisulfite sequencing data.

Analysis of the stool data identified differentially-methylated CpG sites that differ between NEC and control samples. These are summarized in FIG. 45. 6579 genomic loci containing gene bodies or promoter regions with a difference in average methylation rate of at least 0.1 between NEC and control blood samples were identified. Differentially methylated single CpG sites were further identified, and 28,594 of these between NEC and control stool with an adjusted p value (q value) of <0.05 (p value <7.48×10⁻⁵) were found. Further analysis identified the top 30 CpG sites, determined by LCM-coupled WGBS, with the most highly significant differences (q value) between NEC and control stool samples. These are shown in Table 15. Again, the lists of CpG sites represent examples of specific DNA methylation differences that may be detected in the relevant biological samples listed herein, demonstrating that DNA methylation differences such as those listed and many others exist and can be used to differentiate samples as described herein, for example, as described using procedures similar to those set forth in Example Embodiment J.

TABLE 15 The top 30 CpG sites, determined by genome-wide targeted bisulfite sequencing of neonatal stool samples, with the most highly significant differences (q value) between NEC and control neonates. chr coordinate pvalue qvalue meth.diff chr4 148261442 3.30E−36 1.43E−29 53.5847129 chr4 148261415 1.45E−35 3.16E−29 51.6383654 chr7 155267735 1.49E−32 2.16E−26 45.4545455 chr1 207486726 2.65E−25 2.89E−19 57.2916667 chr3 176361389 6.81E−25 5.93E−19 −47.540984 chr17 58155314 6.60E−24 4.79E−18 −43.26874 chr1 25892797 2.10E−21 1.31E−15 69.6666667 chr10 129700127 5.96E−21 2.88E−15 −45 chr14 104186550 5.56E−21 2.88E−15 48.2352941 chr22 18080123 1.77E−19 7.69E−14 −54.285714 chr17 81159445 5.11E−19 2.02E−13 −50.833333 chr10 112356734 6.38E−19 2.31E−13 −86.792453 chr4 678098 1.12E−18 3.47E−13 54.8633185 chr12 130281547 1.20E−18 3.48E−13 52.0294024 chr4 109013654 1.47E−18 4.00E−13 82.8841608 chr1 151852673 3.06E−18 7.83E−13 59.6153846 chr11 71615502 4.61E−18 1.11E−12 41.3995726 chr5 110726839 1.20E−17 2.62E−12 43.7357306 chr11 68342746 1.31E−17 2.71E−12 71.9317678 chr16 29289124 4.63E−17 9.16E−12 51.3902427 chr10 129700126 5.04E−17 9.54E−12 −49.39158 chr13 113343082 2.81E−16 4.36E−11 40.3794643 chr17 1888879 3.46E−16 5.02E−11 −40.35225 chr7 95324592 3.41E−16 5.02E−11 50.8015087 chr11 17390274 3.74E−16 5.24E−11 40.4255319 chr16 75494614 3.99E−16 5.42E−11 39.8058252 chr9 2234921 4.66E−16 6.14E−11 63.5017422 chr1 182084066 1.13E−15 1.36E−10 −43.99529 chr10 102201061 1.70E−15 1.94E−10 53.0410357 chr3 176362257 4.72E−15 5.01E−10 −42.105263

Identification of Differentially Methylated Tissue Markers in Stool and Blood

One goal was to explore the overlap between DNA methylation differences identified in NEC versus control intestinal tissue and NEC versus control stool. It was hypothesized that differentially methylated CpG sites in intestinal tissue would be detectable in stool. Using the pilot data generated using the small number of stool samples from NEC patients and controls, the relationship between NEC-specific biomarkers identified in tissue with those identified in stool was explored. A total of n=41 autosomal CpG sites with adjusted p values of <0.05 and at least 20% methylation that are differentially methylated in both stool and tissue (ileum or colon) were identified. These were markers of interest because of their potential for translation into a stool-based assay for NEC (Table 16).

TABLE 16 Chromosome Coordinate chr1 202007351 chr2 74177514 chr2 218273237 chr2 237905493 chr2 156328443 chr2 27199738 chr4 76692730 chr4 8205509 chr5 73813624 chr6 125914370 chr6 160091207 chr6 109294253 chr7 1941242 chr7 4706797 chr7 2735000 chr7 102614072 chr8 137636543 chr11 83456378 chr11 1238921 chr11 130166648 chr15 59401917 chr15 41502938 chr16 23187923 chr16 49969809 chr16 29820860 chr17 59840980 chr17 49294764 chr17 39098668 chr17 80774434 chr19 38267481 chr20 43559198 chr21 45482307 chr10 8378253 chr11 86676450 chr5 170105682 chr2 190791817 chr1 2315672 chr1 54475455 chr12 114666918 chr19 751200 chr8 122778505

Some overlap of NEC-specific differentially methylated sequences between stool and blood samples was identified (FIG. 46). Specifically, a total of n=145 were identified as shared CpG sites with difference of at least 20% methylation and q=<0.05. These markers were notable because of the potential to identity blood-based changes in the molecular phenotype of NEC in stool (Table 17).

TABLE 17 Chromosome Coordinate chr1 223230217 chr17 4176389 chr19 2607848 chr7 135233855 chr4 1311577 chr11 118344406 chr2 240704772 chr13 40188158 chr21 45470957 chr4 123304818 chr16 89162563 chr1 116987097 chr17 76270311 chr6 139161125 chr1 203289852 chr1 167518392 chr8 266789 chr17 7549609 chr16 18558775 chr7 135233856 chr1 111593228 chr15 42156044 chr1 85298457 chr2 240704741 chr9 124840384 chr1 183195278 chr9 106921553 chr8 143461306 chr2 240704800 chr13 40188108 chr19 16330718 chr6 6614442 chr3 16301690 chr3 16301758 chr2 240704799 chr2 112182833 chr14 64468656 chr14 103140911 chr19 4153717 chr11 76232249 chr6 6614487 chr1 12453505 chr20 32359934 chr7 2076816 chr16 8644675 chr7 2076791 chr1 23949357 chr7 73299956 chr14 56205791 chr7 2076830 chr5 74642861 chr7 2076837 chr7 2076821 chr10 75428630 chr4 24974189 chr2 66444595 chr1 8211944 chr7 2076878 chr12 11724866 chr19 45379921 chr7 2048342 chr4 39481120 chr11 72135742 chr14 102210824 chr17 9361393 chr7 2077145 chr1 2084262 chr11 73272408 chr7 2076851 chr19 13835962 chr16 81495456 chr7 2076970 chr1 220882458 chr12 124506266 chr7 1152618 chr9 35042348 chr7 100431249 chr11 46522145 chr14 102210833 chr17 74531468 chr11 66509471 chr3 10357812 chr4 184775559 chr11 62806931 chr14 22851685 chr16 88975383 chr16 89764825 chr11 46522066 chr11 72135687 chr14 102210805 chr1 8211961 chr8 8374015 chr2 43540209 chr1 26554057 chr7 101718554 chr5 78527242 chr7 2076971 chr4 39481134 chr9 105065030 chr1 220882435 chr3 195154566 chr10 78256178 chr15 69462230 chr8 129988124 chr1 9089676 chr3 194599071 chr1 220882449 chr6 130993602 chr5 151298601 chr7 2076930 chr6 131572840 chr11 3154396 chr3 66442636 chr16 29607174 chr8 129988086 chr20 35093420 chr2 54673911 chr19 17022149 chr7 4714929 chr11 3154467 chr19 1423758 chr4 8200785 chr11 66335268 chr1 220882431 chr1 12714901 chr11 86673628 chr17 27632878 chr4 6939133 chr11 6576981 chr5 177508225 chr9 136917905 chr12 124487335 chr7 2076955 chr4 39481150 chr15 74422826 chr4 24974053 chr8 6970582 chr3 11556527 chr16 929814 chr11 71478431 chr9 136917853 chr2 239210589 chr1 1185856 chr16 29607169 chr13 27054217

Example Embodiment J of Example 5: Estimation of Abnormal Stool Methylome Variation in Targeted Regions for Diagnosis of Necrotizing Enterocolitis

Provided below is an algorithm that can be used to diagnose a subject with necrotizing enterocolitis. The presently disclosed subject matter provides that the methylome(s) of intestinal tissue of a mammal, or structures therein (e.g., small intestines), could be affected by certain abnormalities (e.g., necrotizing enterocolitis), and that the changes of these methylomes can lead to changes in the methylation patterns of the DNA fragments found in stool, which are released by intestinal tissue. An algorithm was developed to identify the changes of methylation patterns in the methylome of stool caused by intestinal tissue phenotypes. The main insight behind this algorithm was that the methylome of the DNA fragments in stool is a mixture of a variety of component methylomes of intestinal origin, and that the proportion of these different component methylomes in the mixture varies from subject to subject, even among the population with normal intestinal tissue phenotype. By constructing a model of stool methylome as a linear combination of various component methylomes of intestinal tissue origins, the algorithm can accurately predict the methylation patterns of a new stool sample under the hypothesis that it is from a normal individual. Consequently, the algorithm has high sensitivity for detecting abnormal methylation patterns in a stool sample caused by changes of the methylomes of some intestinal tissues when the sample is from an affected individual. The procedure can be applied with little modification to the diagnosis of NEC using other types of biopsy samples, such as plasma, provided that the DNA fragments from the tissues affected by NEC can be found in those biopsy samples.

Let i be any CpG site in human genome, z_(i,j) be the methylation level of CpG site i in a stool sample j, p_(i,r,j) be the proportion of the r^(th) component methylome m_(r,j) of intestinal tissue origin in stool sample j at site i, m_(i,r,j) be the methylation level of CpG i in methylome m_(r,j). The hypothesis is:

Z_(i,j)=Σ_(r=1) ^(R)p_(i,r,j)m_(i,r,j)  (1)

where p_(i,r,j), m_(i,r,j)>=0, m_(i,r,j)<=1, p_(i,1,j)+ . . . +p_(i,R,j)=1.

It is further assumed that there is a set of CpG sites S such that, for any CpG site i in S, and any stool j from a normal individual, it has m_(I,r,j)=m_(l,r) and p_(I,r,j)=p_(r,j).

That is, it is assumed that in any stool sample from a normal individual, the proportions of different component methylomes in the mixture are the same for all CpG sites in S. It is also assumed that, by restricting to the set of CpG sites S, stool samples from all normal individuals have the same set of component methylomes. They are called restricted reference component methylomes (RRCM), and are labeled as m₁ ^(S), . . . , m_(R) ^(S) or simply m₁, . . . , m_(R) when there is no confusion. For any stool sample j from a normal individual, its methylome restricted to set of CpG sites in S can be expressed as a weighted average of the restricted reference component methylomes. More precisely, let z_(j) ^(S) be the methylome of stool sample C restricted to S, then for some mixture vector p_(j)=[p_(j,l) . . . , p_(j,R)]^(T), it has:

z_(j) ^(s)=[m₁ ^(S), . . . m_(R) ^(S)]p_(j)  (2)

Finally, it is assumed that the set S is the union of two disjoint subsets C and T, where T is a union of K non-empty sets T_(k) such that T=U_(k=1) ^(K)T_(k) where the index k represents the k^(th) type of abnormal intestinal tissue phenotype. T_(k)'s do not need to be disjoint. Moreover, T_(k) itself is the union of two disjoint sets D_(k) and V_(k). Either D_(k) or V_(k) could be empty, but not both. It is assumed that for any stool sample, including one from an abnormal individual, when restricted to CpG sites in C, its methylome can always be expressed as a weighted average of the restricted reference component methylomes. That is, it has: z_(j) ^(C)=[m₁ ^(C), . . . , m_(R) ^(C)]p_(j) regardless whether j is from an abnormal individual. C is called the set of reference CpG sites. On the other hand, for a stool sample l from an abnormal individual, when restricted to CpG sites in S=CUT, its methylome can no longer be expressed as a weighted average of the restricted reference component methylomes. That is, it has: w₁ ^(S)≠[m₁ ^(S), . . . , m_(R) ^(S)]p_(l) for any mixture vector p_(l). More specifically, for a stool sample l from an individual with the k^(th) type of abnormal phenotype, it has: 1), w_(j) ^(C)=m₁ ^(C), . . . , m_(R) ^(C)]p_(l), 2), if D_(K) is non-empty, then w_(i)D_(K)=[m_(1,k) ^(D) ^(k) , . . . , m_(R,k) ^(D) ^(k) ]p_(l) such that [m₁ ^(D) ^(k) , . . . , m_(R) ^(D) ^(k) ]·[m_(1,k) ^(D) ^(k) , . . . , m_(R,k) ^(D) ^(k) ], and 3), if V_(k) is non-empty, then w_(l) ^(V) ^(k) =[m₁ ^(V) ^(k) , . . . , m_(R) ^(V) ^(k) ]q_(l) such that p_(l)≠q_(l). In other words, in a stool sample from the k^(th) type of abnormal individual, if the set D_(k) is not empty, the component methylomes of the sample l restricted to D_(k) are no longer the same as the reference component methylome restricted to D_(k). If the set V_(k) is not empty, in this stool sample, the proportion of the reference component methylomes restricted to V_(k) is no longer the same as the proportion of the reference component methylome restricted to R.

T is called the target set of CpG sites, D_(k) is called the differential methylation target set, V_(k) is called the copy number variation target set, and T_(k) is called the target set for the k^(th) type of abnormal phenotype.

The main steps of the algorithm of this Example are:

-   -   1) Identify the sets of reference CpG sites C, and T₁, . . . ,         T_(K) for the list of K types of abnormal individuals.     -   2) Estimate the restricted reference component methylomes m_(R),         or R predictor methylomes n₁, . . . , n_(R) that are independent         linear combinations of the reference component methylomes such         that n_(r)=[m₁, . . . , m_(R)]q_(r) for R linearly independent         mixture vectors q₁, . . . , q_(R).     -   3) (Optional) If the reference component methylomes are         available, estimate the proportions of these components at the         reference CpG sites C for the test stool samples.     -   4) Predict the methylation level of the test stool samples at         the target set T_(k) of CpG sites, under the hypothesis that the         sample is from a normal individual.     -   5) Compare the predicted methylation levels at D_(k) and V_(k)         against the observed methylation levels, and reject the null         hypothesis that a test sample is from a normal individual if the         observed methylation levels are significantly different form the         predicted levels.

The algorithm of this Example can be implemented in a variety of ways. For example, given the methyl-seq data for a set of stool samples from normal individuals, the presently disclosed EM algorithm or the data augmentation method can be applied to estimate the component methylomes, then use the maximum likelihood method to estimate the proportion of these component methylomes in the test sample. Below are exemplary simple implementations of the presently disclosed algorithm that use linear regression.

In the simplest implementation of the algorithm of this Example, it is assumed the restricted methylome of a stool sample from a normal individual can be approximated by a mixture of two restricted reference methylomes. It is further assumed that the estimations of these two reference component methylomes are available. For example, in the implementation below, for the genomic loci of interest, the stool methylome is approximated by the mixture of ileum and colon methylomes. The implementation of the algorithm includes the following steps:

1. Identify the Reference Set C, and the Target Sets T₁, . . . , T_(K).

-   -   1.1 Collect the methylation data for a set of colon-derived cell         samples, a set of ileum-derived cell samples, and a set of stool         samples, all from normal individuals. For each type of abnormal         individuals, collect a set of colon-derived samples, a set of         ileum-derived cell samples, and a set of stool samples from that         type of abnormal individuals. All these samples should have         matched age, race, and other relevant parameters. These are the         training data.     -   1.2 Let x_(i,j) be the observed methylation level of CpG site i         in a normal colon-derived sample j, and y_(i,l) the observed         methylation level of CpG site i in a normal ileum-derived cell         sample l, s_(x,i) ² the sample variance of x_(i,j) over all         normal colon-derived samples, s_(y,i) ² the sample variance of         y_(i,j) over all normal ileum-derived cell samples. Identify the         CpG sites S₀ such that for any i∈S₀, it has both s_(x,i) ²<c₀         and s_(y,i) ²<c₀ for some constant c₀. These are CpG sites with         stable methylation levels in each type of normal cells.     -   1.3 Let x_(i,j) be the observed methylation level of CpG site i         in a colon-derived sample j, including normal and abnormal, and         y_(i,l) the observed methylation level of CpG site i in a         ileum-derived cell sample l, including normal and abnormal,         s_(x,i) ² the sample variance of x_(i,j) over all colon-derived         samples, including normal and abnormal, s_(y,i) ² the sample         variance of y_(i,j) over all ileum-derived cell samples,         including normal and abnormal. Identify the CpG sites S₁ such         that for any i∈S₁, it has both s_(x,i) ²<c₀ and s_(y,i) ²<c₀ for         some constant c₀, and that the statistical test for the         difference between {x_(i,j0): j0 is a normal colon—derived         sample} and {x_(i,jk): jk is an abnormal colon—derived sample of         type k} is not significant for all abnormal types of         colon-derived, and that the statistical test for the difference         between {y_(i,j0): j0 is a normal ileum—derived cell sample} and         {y_(i,jk): jk is an abnormal ileum—derived cell sample of type         k} is not significant for all abnormal types of ileum-derived         cell. These are CpG sites with stable methylation levels in each         type of cells, and with no difference in methylation level         between normal and any abnormal samples. Let x_(i) be the sample         mean of x_(i,j) over all colon-derived samples, including normal         and abnormal, y_(i) the sample mean of over all ileum-derived         cell samples, including normal and abnormal. Identify the subset         C₀ of S₁ such that for any i∈C₀, it has |x_(i)−y_(i)|>c₁ for         some constant c₁. These are CpG sites that are stably methylated         in each cell type, with no difference between the normal and         abnormal samples of the same cell type, and differentially         methylated between different types of cells.     -   1.4 Let x^(R) ⁰ be the vector of x_(i) for all i∈C₀, and y^(C) ⁰         be the vector of y_(i) for all i∈C₀, where x_(i) is the mean         methylation at site i in all colon-derived samples y_(i) the         mean methylation at site i in all ileum-derived cell samples.         Note that by the way the set C₀ is selected, there is no         difference in the methylation level of any CpG sites in C₀         between normal and abnormal colon-derived samples, or between         normal and abnormal ileum-derived cell samples. Let z_(j) ^(C) ⁰         be the observed methylation levels of CpG sites in C₀ for a         stool sample j of the k^(th) abnormal type. (For convenience,         the normal stool sample is called as sample of the 0th abnormal         type). For each sample j belonging to the k^(th) abnormal type,         regress z_(j) ^(C) ⁰ against x^(C) ⁰ and y^(C) ⁰ , with the         constraints that the intercept must be 0, and the coefficients         must be non-negative and add to 1, and get the residual e_(j)         ^(C) ⁰ . Identify the subset C₀ ^(k) of C₀ such that for any CpG         i in C₀ ^(k), it has

${\frac{e_{i,k}^{2}}{s_{i,k}} < c_{2}},$

and e_(i,k) ²<c₃ for some constants c₂ and c₃, where e_(i,k) ² is the mean of the squared difference between estimated and observed methylation levels of CpG site i in all stool samples of the k^(th) abnormal type, and s_(i,k) ² the sample variances of methylation levels of CpG site i in the same set of stool samples. Repeat the above procedure for each type of abnormal stool samples, the intersection of the subsets C=∩_(k=0) ^(K)C₀ ^(k) is the reference set of CpG sites. These are CpG sites where their methylation levels in both normal and any type of abnormal stool samples can be accurately predicted by the reference component methylomes from normal individuals.

-   -   1.5 Let T₀=S₀\ S₁. Let x^(C) and x^(T) ⁰ be the vectors of x_(i)         and x_(h) for all i∈C and h∈T₀ respectively, and y^(C) and y^(T)         ⁰ be the vectors of y_(i) and y_(h) for all i∈C and h∈T₀         respectively, where x_(i), x_(h), y_(i), and y_(h) are mean         methylation level of sites for a normal colon-derived or         ileum-derived cell at sites i and h respectively. Let z_(j) ^(C)         and z_(j) ^(T) ⁰ and be the observed methylation levels of CpG         sites in C and T₀ respectively for a normal stool sample j,         w_(l) _(k) ^(C) and w_(l) _(k) ^(T) ⁰ the observed methylation         level of CpG sites in C and T₀ respectively for a stool sample         l_(k) from an individual with the k^(th) type of abnormality,         w_(l) _(g) ^(C) and w_(l) _(g) ^(T) ⁰ the observed methylation         level of CpG sites in C and T₀ respectively for a stool sample         l_(g) from an individual with the g^(th) type of abnormality,         where g≠k. For each j, l_(k), and l_(g), regress z_(j) ^(C),         w_(l) _(k) ^(C), and w_(l) _(g) ^(C) respectively against x^(C)         and y^(C), with the constraints that the intercept must be 0,         and the coefficients must be non-negative and add to 1. Apply         the fitted models respectively to x^(T) ⁰ and y^(T) ⁰ to predict         z_(j) ^(T) ⁰ , w_(l) _(k) ^(T) ⁰ , and w_(l) _(g) ^(T) ⁰         respectively, and get the differences e_(j) ^(T) ⁰ , e_(l) _(k)         ^(T) ⁰ and e_(l) _(g) ^(T) ⁰ between the predicted values and         observed values. Let e_(i), e_(i,k), and e_(i,g) be the means of         the sets of differences {e_(j) ^(T) ⁰ : j is a normal stool         sample}, {e_(l) _(k) ^(T) ⁰ : l_(k) is a stool sample of th         k^(th) abnormal type} and {e_(l) _(g) ^(T) ⁰ : l_(g) is a stool         sample of the g^(th) abnormal type} for CpG site i respectively.         Identify the subset T_(k) of T₀ such that for any i∈T_(k), it         has |e_(i)|<c_(2,0), |e_(I,k)|>c_(2,k), and         |e_(i,k)−e_(i,g)|>c_(3,k), for some constants C_(2,0), C_(2,k),         and C_(3,k), for all g≠k. T_(k) is the target set for the k^(th)         type of the abnormal individual. These are the sites where the         methylation of a normal stool sample can be accurately         predicted, the observed methylation in a stool sample of the         k^(th) abnormal type will deviate from the prediction, and         deviation will be different from that of a stool sample of any         other abnormal type.

2. Estimate Fraction of the New Stool Samples to be Tested

Recall that x^(c) and y^(c) are mean vectors of the methylation levels of the training colon-derived and training ileum-derived cell data for the CpG sites in the reference set C. For any new stool sample t to be tested, let z_(t) ^(C) be the observed methylation levels of CpG sites in C. Regress z_(t) ^(C) against x^(C) and y^(C), with the constraints that the intercept must be 0, and the coefficients must be non-negative and add to 1. The estimated coefficients are the estimated fractions of the component methylomes for the stool sample t.

3. Test if the New Stool Samples are from the k^(th) Type of Abnormal Individual.

For the new stool sample t, let X^(T) ^(k) and y^(T) ^(k) be mean vectors of the methylation levels of the training colon-derived and training ileum-derived cells data for the CpG sites in the target set T_(k) identified in step 1 of this algorithm, apply the fitted regression models obtained from the step 2 of this algorithm to X^(T) ^(k) and y^(T) ^(k) to predict the methylation levels of CpG sites in T_(k) for sample t under the hypothesis that sample t is from a normal. Let n_(k) be the number of CpG sites in T_(k). Define functions f_(k)(x₁, . . . , x_(n) _(k) )=Σ_(i)(−1)^(I_(e) ^(i,k) ^(−e) ^(i) ⁾x_(i) and f_(k,g)(x₁, . . . , x_(n) _(k) )=Σ_(i)(−1)^(I_(e) ^(i,k) ^(−e) ^(i,g) ⁾x_(i), where I_(⋅)=I_((−∞,0))(⋅), that is, the indicator function for the interval (−∞, 0), e_(i), e_(i,k) and e_(i,g) are estimations obtained from step 1.5 of the algorithm. It will be said the sample is from an individual with the k^(th) type of abnormal phenotype if f_(k)(e_(1,t)−e₁, . . . , e_(n) _(k) _(,t)−e_(n) _(k) )>c_(4,k), and f_(k,g)(e_(1,t)−e_(1,g), . . . , e_(n) _(k) _(,t)−e_(n) _(k) _(,g))>c_(5,g) for all g≠k, where e_(i,t) is the difference between the observed methylation level of the CpG site i∈T_(k) for sample t and the predicted value by the fitted model obtained from step 2, and g≠k is any type of abnormal phenotype that is different form the k^(th) type of abnormal phenotype.

Other ways of implementing the algorithm of this Example can be developed by modifying the simple implementation presented above. Specifically, it does not need to assume that there are only two component reference methylomes that make up the stool methylomes, nor does it need to approximate them by mixtures of the component methylomes. Instead, a set of predictor methylomes can be collected that are themselves mixtures of component reference genomes, as long as the number of the predictor methylomes is the same as the number of the reference component methylomes, and the mixture vectors of the predictor methylomes are linearly independent. For example, they can be methylomes of stool samples with known different proportion of colon-derived and ileum-derived cell DNAs.

In the algorithm of this Example, the difference between observed methylation levels in certain target regions and the predicted methylation levels as the test statistic to determine if in a stool sample the methylome has been affect by some type of intestinal tissue abnormality. To illustrate the advantage of this approach, it is assumed that the mixture vector p_(j) for the methylome of a normal stool sample j followed a Dirichlet's distribution with parameters α_(i)= . . . =α_(R). Furthermore, for CpG site i, its methylation levels in the R reference vector p_(j) for component methylomes are m_(i,r)=(r−1)/(R−1). It can be shown that the methylation level of i in sample j then has a mean of 0.5, and a variance of

$\frac{R + 1}{12\left( {R - 1} \right)\left( {{R\;\alpha_{1}} + 1} \right)}.$

If there is a methyl-seq library of sample j with a coverage of N for CpG site i, the variance of the measured methylation level z_(i,j) is

$\sigma_{1}^{2} = {\frac{1}{4N} + {\frac{N - 1}{N}{\frac{R + 1}{12\left( {R - 1} \right)\left( {{R\;\alpha_{1}} + 1} \right)}.}}}$

In other words, if z_(i,j) is used as a test statistic to detect abnormal intestinal tissue using stool sample, under the null hypothesis, the test statistic has a variance of σ₁ ². However, in the presently disclosed algorithm, it is first estimated the mixture vector p_(j), then predicted x_(i,j) by Σ_(r)m_(i,r)p_(r,j). Note that in a methyl-seq data, it can get millions of CpG sites covered in each library, and that the variance of the coefficients in a linear regression model is inversely proportional to sample size. Thus it is possible to get highly accurate estimation of the mixture vector p_(j), even if it is taken into account that adjacent CpG sites tend to have correlated methylation levels. Assuming an accurate estimate of Σ_(r)m_(i,r)p_(r,j) can be obtained, that is, the error of the estimation can be ignored, the variance of the difference z^(i,j)−Σ_(r)m_(i,r)p_(r,j) between the observed methylation level and the prediction will be

$\frac{1}{4N} - {\frac{1}{N}{\frac{R + 1}{12\left( {R - 1} \right)\left( {{R\;\alpha_{1}} + 1} \right)}.}}$

In other words, under the null hypothesis, the test static z_(i,j)−Σ_(r)m_(i,r)p_(r,j) used in the presently disclosed algorithm has a much smaller variance than the other candidate test statistic z_(i,j). This in turns means that the presently disclosed test will achieve a higher power at the same level of type I error.

Examples of Embodiments for Example 5

E1. A method for diagnosing, prognosing, classifying, and/or monitoring necrotizing enterocolitis in a mammal, comprising:

(a) obtaining a sample from the mammal;

(b) determining the methylation status and/or level of one or more genomic loci in the sample;

(c) comparing the methylation status and/or level of the one or more genomic loci to a reference or a set of predicted values; and

(d) diagnosing necrotizing enterocolitis in the mammal,

wherein the difference in the methylation status and/or level of the one or more genomic loci in the sample compared to the reference or a set of predicted values indicates the presence of necrotizing enterocolitis in the mammal.

E2. The method of embodiment E1, wherein an increase in the level of methylation of the one or more genomic loci in the sample indicates the presence of necrotizing enterocolitis in the mammal or a decrease in the level of methylation of the one or more genomic loci in the sample indicates the presence of necrotizing enterocolitis in the mammal. E3. The method of embodiment E1, wherein a decrease in the level of methylation of at least one of the one or more genomic loci in the sample and an increase in the level of methylation of at least one of the one or more genomic loci in the sample indicates the presence of necrotizing enterocolitis in the mammal. E4. A method of treating necrotizing enterocolitis in a mammal, comprising:

(a) obtaining a sample from the mammal;

(b) determining the methylation status and/or level of one or more genomic loci present in the sample;

(c) comparing the methylation status and/or level of the one or more genomic loci to a reference or a set of predicted values;

(d) diagnosing necrotizing enterocolitis in the mammal, wherein the difference in the methylation status and/or level of the one or more genomic loci in the sample compared to the reference or a set of predicted values indicates the presence of necrotizing enterocolitis in the mammal; and

(e) administering an antibiotic therapy.

E5. The method of any one of embodiments E1-E4, wherein the reference is the methylation status and/or level of the one or more genomic loci in a sample obtained from a mammal that does not have necrotizing enterocolitis. E6. The method of any one of embodiments E1-E5, wherein said sample is a stool sample. E7. A method of treating necrotizing enterocolitis comprising:

(a) measuring the methylation status and/or level of one or more genomic loci present in a sample from a mammal prior to a treatment of necrotizing enterocolitis;

(b) measuring the methylation status and/or level of one or more genomic loci present in a sample from the mammal during the treatment of necrotizing enterocolitis; and

(c) continuing the treatment if the difference in the methylation status and/or level of the one or more genomic loci between the samples from prior to and during the treatment of necrotizing enterocolitis indicates the mammal is responsive to the treatment.

E8. The method of embodiment E7, further comprising (d) administering a different treatment to the mammal if the difference in the methylation status and/or level of the one or more genomic loci between the samples from prior to and during the treatment of necrotizing enterocolitis indicates the mammal is not responsive to the treatment. E9. The method of embodiment E7, wherein an increase in the level of methylation of the one or more genomic loci in the sample indicates the mammal is responsive to the treatment, or a decrease in the level of methylation of the one or more genomic loci in the sample indicates the mammal is responsive to the treatment. E10. The method of embodiment E7, wherein a decrease in the level of methylation of at least one of the one or more genomic loci in the sample and an increase in the level of methylation of at least one of the one or more genomic loci in the sample indicates the mammal is responsive to the treatment. E11. The method of embodiment E7, wherein an increase in the level of methylation of the one or more genomic loci in the sample indicates the mammal is not responsive to the treatment, or a decrease in the level of methylation of the one or more genomic loci in the sample indicates the mammal is not responsive to the treatment. E12. The method of embodiment E7, wherein a decrease in the level of methylation of at least one of the one or more genomic loci in the sample and an increase in the level of methylation of at least one of the one or more genomic loci in the sample indicates the mammal is not responsive to the treatment. E13. The method of any one of embodiments E7-E12, wherein said sample is a stool sample. E14. The method of any one of embodiments E1-E13, wherein the one or more genomic loci comprise one or more CpG sites. E15. The method of any one of embodiments E1-E14, wherein the one or more genomic loci are present within nucleic acids isolated from the sample. E16. The method of any one of embodiments E1-E15, wherein the one or more genomic loci are present within cell-free nucleic acids isolated from the sample. E17. A method of treating necrotizing enterocolitis, comprising;

(a) diagnosing necrotizing enterocolitis in a mammal by utilization of the algorithm disclosed in Example Embodiment J; and

(b) administering an antibiotic therapy to said mammal to treat said necrotizing enterocolitis.

E18. The method of any one of embodiments E1-E17, wherein said mammal is a human. E19. A kit for diagnosing, prognosing, and/or monitoring necrotizing enterocolitis in a mammal comprising a means for determining and/or detecting the methylation status of one or more genomic loci. E20. The kit of embodiment E19, wherein the means comprises one or more primers and/or probes for determining and/or detecting the methylation status of the one or more genomic loci.

Abstract of this Example (Example 5)

This Example provides methods for diagnosing, prognosing, monitoring, classifying, and/or treating necrotizing enterocolitis. For example, algorithms, kits, and methods for diagnosing, prognosing, monitoring, classifying, and/or treating necrotizing enterocolitis are provided.

Example 6—Methods and Materials for Assessing and Treating Ovarian Cancer Field of this Example

This Example relates to methods and materials involved assessing a mammal (e.g., a human) for and/or treating a mammal (e.g., human) having or developing ovarian cancer. For example, this Example provides methods and materials for using DNA methylation profiles of nucleic acid within a liquid biopsy (e.g., a plasma sample, a urine sample, a peritoneal fluid sample, or a cervical swab sample) to determine whether or not a mammal has, or is developing, ovarian cancer. This Example also provides methods, algorithms, and kits for diagnosing, prognosing, monitoring, classifying, and/or treating ovarian cancer.

Background of this Example

Ovarian cancer often remains undetected until it becomes incurable or challenging to treat. Early diagnosis is frequently impossible or dangerously invasive by current methods. There is therefore great interest in the development of practical and accurate methods for the non-invasive detection and phenotyping of ovarian cancer.

Summary of this Example

This Example provides methods and materials involved assessing a mammal (e.g., a human) for and/or treating a mammal (e.g., human) having or developing ovarian cancer. For example, this Example provides methods and materials for using DNA methylation profiles of nucleic acid within a liquid biopsy (e.g., a plasma sample, a urine sample, a peritoneal fluid sample, or a cervical swab sample) to determine whether or not a mammal has, or is developing, ovarian cancer. Determining if a mammal (e.g., a human) has, or is likely to develop, ovarian cancer by assessing DNA methylation profiles of nucleic acid within a liquid biopsy (e.g., a plasma sample, a urine sample, a peritoneal fluid sample, or a cervical swab sample) can aid in the identification of mammals (e.g., humans) that should be treated in a particular manner (e.g., by administering chemotherapy, by administering immunotherapy, and/or by performing a surgical treatment), for example, early in the disease process.

This Example also provides methods for diagnosing, prognosing, monitoring, classifying, and/or treating ovarian cancer. For example, the methods described herein can include determining the methylation status of one or more genomic loci in a sample (e.g., a plasma sample, a urine sample, a peritoneal fluid sample, or a cervical swab sample) of a mammal (e.g., a female human). This Example further provides algorithms and kits for diagnosing, prognosing, monitoring, classifying, and/or treating ovarian cancer.

In one aspect, this Example provides a method for diagnosing, prognosing, classifying, and/or monitoring ovarian cancer in a mammal (e.g., a human female) comprising: (a) obtaining a sample (e.g., a plasma sample, a urine sample, a peritoneal fluid sample, or a cervical swab sample) from the mammal; (b) determining the methylation status and/or level of one or more genomic loci in the sample; (c) comparing the methylation status and/or level of the one or more genomic loci to a reference or a set of predicted values; and (d) identifying the mammal as having ovarian cancer, wherein the difference in the methylation status and/or level of the one or more genomic loci in the sample compared to the reference or a set of predicted values indicates the presence of ovarian cancer in the mammal.

In another aspect, this Example provides a method of treating ovarian cancer in a mammal (e.g., a human female) comprising: (a) obtaining a sample (e.g., a plasma sample, a urine sample, a peritoneal fluid sample, or a cervical swab sample) from the mammal; (b) determining the methylation status and/or level of one or more genomic loci present in the sample; (c) comparing the methylation status and/or level of the one or more genomic loci to a reference or a set of predicted values; (d) identifying the mammal as having ovarian cancer, wherein the difference in the methylation status and/or level of the one or more genomic loci in the sample compared to the reference or a set of predicted values indicates the presence of ovarian cancer in the mammal; and (e) treating the mammal by administering a chemotherapy, by administering an immunotherapy, and/or by performing a surgical treatment.

In some cases, an increase in the level of methylation of the one or more genomic loci in a sample (e.g., a plasma sample, a urine sample, a peritoneal fluid sample, or a cervical swab sample) can indicate the presence of ovarian cancer in the mammal or a decrease in the level of methylation of the one or more genomic loci in a sample (e.g., a plasma sample, a urine sample, a peritoneal fluid sample, or a cervical swab sample) indicates the presence of the ovarian cancer in the mammal. In some cases, a decrease in the level of methylation of at least one of the one or more genomic loci in a sample and an increase in the level of methylation of at least one of the one or more genomic loci in the sample can indicate the presence of ovarian cancer in the mammal.

In some cases, the reference can be the methylation status and/or level of the one or more genomic loci in a sample (e.g., a plasma sample, a urine sample, a peritoneal fluid sample, or a cervical swab sample) obtained from a mammal (e.g., a human female) that does not have ovarian cancer.

In another aspect, this Example provides a method of treating ovarian cancer in a mammal (e.g., a human female) comprising: (a) measuring the methylation status and/or level of one or more genomic loci present in a sample (e.g., a plasma sample, a urine sample, a peritoneal fluid sample, or a cervical swab sample) from the mammal prior to a treatment of ovarian cancer; (b) measuring the methylation status and/or level of one or more genomic loci present in a sample (e.g., a plasma sample, a urine sample, a peritoneal fluid sample, or a cervical swab sample) from the mammal during the treatment of ovarian cancer; and (c) continuing the treatment if the difference in the methylation status and/or level of the one or more genomic loci between the samples from prior to and during the treatment of ovarian cancer indicates the subject is responsive to the treatment. In some cases, the method further comprises (d) administering a different treatment to the mammal if the difference in the methylation status and/or level of the one or more genomic loci between the samples from prior to and during the treatment of ovarian cancer indicates the subject is not responsive to the treatment.

In some cases, an increase in the level of methylation of the one or more genomic loci in the sample indicates that the mammal is responsive to the treatment, or a decrease in the level of methylation of the one or more genomic loci in the sample indicates the mammal is responsive to the treatment. In certain embodiments, a decrease in the level of methylation of at least one of the one or more genomic loci in the sample and an increase in the level of methylation of at least one of the one or more genomic loci in the sample indicates the mammal is responsive to the treatment. In certain embodiments, an increase in the level of methylation of the one or more genomic loci in the sample indicates the mammal is responsive to the treatment, or a decrease in the level of methylation of the one or more genomic loci in the sample indicates the mammal is not responsive to the treatment. In certain embodiments, a decrease in the level of methylation of at least one of the one or more genomic loci in the sample and an increase in the level of methylation of at least one of the one or more genomic loci in the sample indicates the subject is not responsive to the treatment.

This Example further provides algorithms for diagnosing and/or monitoring a mammal having ovarian cancer. In certain embodiments, the algorithm can be used to classify ovarian cancer of a mammal (e.g., a human female).

In another aspect, this Example provides a kit for diagnosing, prognosing, and/or monitoring ovarian cancer in a mammal comprising a means for determining and/or detecting the methylation status of one or more genomic loci. In certain embodiments, the means comprises one or more primers and/or probes for determining and/or detecting the methylation status of the one or more genomic loci.

In certain embodiments, the one or more genomic loci are present within nucleic acids isolated from the sample. In certain embodiments, the one or more genomic loci are present within cell-free nucleic acids isolated from the sample.

Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention pertains. Although methods and materials similar or equivalent to those described herein can be used in the practice or testing of the present invention, suitable methods and materials are described below. All publications, patent applications, patents, and other references mentioned herein are incorporated by reference in their entirety. In case of conflict, the present specification, including definitions, will control. In addition, the materials, methods, and examples are illustrative only and not intended to be limiting.

Other features and advantages of the invention will be apparent from the following detailed description, and from the claims.

Description of this Example

This Example provides methods for diagnosing, prognosing, monitoring, classifying, and/or treating ovarian cancer. For example, the methods described herein can include determining the methylation status of one or more genomic loci in a sample of a mammal (e.g., a human female). In some cases, the methods described herein can include the use of an algorithm to diagnose, prognose, monitor, classify, and/or assist in the treatment of ovarian cancer. Non-limiting embodiments of this Example are described by the present specification and Examples.

Unless defined otherwise, all technical and scientific terms used in this Example generally have their ordinary meanings in the art, within the context of this Example and in the specific context where each term is used. The following references provide one of skill with a general definition of many of the terms used in this Example: Singleton et al., Dictionary of Microbiology and Molecular Biology (2nd ed. 1994); The Cambridge Dictionary of Science and Technology (Walker ed., 1988); The Glossary of Genetics, 5th Ed., R. Rieger et al. (eds.), Springer Verlag (1991); and Hale & Marham, The Harper Collins Dictionary of Biology (1991). Certain terms are discussed below, or elsewhere in the specification, to provide additional guidance to the practitioner in describing the compositions and methods of this Example and how to make and use them.

The terms “comprise(s),” “include(s),” “having,” “has,” “can,” “contain(s),” and variants thereof, as used herein, are intended to be open-ended transitional phrases, terms or words that do not preclude the possibility of additional acts or structures. The singular forms “a,” “an,” and “the” include plural references unless the context clearly dictates otherwise. The present Example also contemplates other embodiments “comprising,” “consisting of” and “consisting essentially of,” the embodiments or elements presented herein, whether explicitly set forth or not.

As used herein, the use of the word “a” or “an” when used in conjunction with the term “comprising” in the claims and/or the specification may mean “one,” but it is also consistent with the meaning of “one or more,” “at least one,” and “one or more than one.” Still further, the terms “having,” “including,” “containing,” and “comprising” are interchangeable, and one of skill in the art is cognizant that these terms are open ended terms.

The term “about” or “approximately” means within an acceptable error range for the particular value as determined by one of ordinary skill in the art, which will depend in part on how the value is measured or determined, i.e., the limitations of the measurement system. For example, “about” can mean within 3 or more than 3 standard deviations, per the practice in the art. Alternatively, “about” can mean a range of up to 20%, preferably up to 10%, more preferably up to 5%, and more preferably still up to 1% of a given value. Alternatively, particularly with respect to biological systems or processes, the term can mean within an order of magnitude, preferably within 5-fold, and more preferably within 2-fold, of a value.

In certain embodiments, the term “biomarker” refers to a marker (e.g., DNA methylation status) that allows detection of a disease and/or disorder in an individual, including detection of the disease or the disorder in its early stages. In certain embodiments, the term “biomarker” refers to a marker (e.g., DNA methylation status) that allows the characterization of a phenotype of a disease and/or a disorder in an individual. Early stage of a disease, as used herein, refers to the time period between the onset of the disease and the time point that signs or symptoms of the disease emerge. In certain non-limiting embodiments, the presence, absence, and/or level of a biomarker in a sample of a mammal (e.g., a human) is determined by comparing to a reference control.

The terms “reference sample,” “reference control,” “control,” or “reference,” as used interchangeably herein, refers to a control for a methylation status of a genomic locus that is to be detected in a sample of a mammal. In certain embodiments, a reference sample can be a sample from a healthy individual, e.g., an individual that does not have ovarian cancer. In certain embodiments, a reference sample can be a sample from a control individual that does not have the disease or phenotype to be detected by a biomarker disclosed herein. In certain embodiments, a control or reference can be the presence, absence, and/or a particular level of a methylation state of a genomic locus in a healthy individual. In certain embodiments, a reference can be a predetermined presence, absence, and/or particular level of a methylation state of a genomic locus that indicates a subject does not have ovarian cancer. In certain embodiments, a reference can be the methylation status of a locus in an individual having a disease or a phenotype, e.g., an individual that has ovarian cancer, where the methylation status of the locus is known to be not associated with the disease or the phenotype.

The term “a set of predicted values” refers to the methylation status of certain genomic loci for a sample. The status of those loci is not directly measured from that sample. Rather, it is inferred from measurements of other loci for that sample and/or measurements of other samples. The inference of the predicted values is based on some mathematical/statistical models. The models usually assume that the sample for which the methylation status of those loci is to be predicted has a normal phenotype. This assumption may be either correct or wrong, but its correctness is not required for the inference of the predicted values.

The term “slightly invasive or non-invasive method” refers to a method that does not involve the removal of tissues by biopsy from the ovaries. In certain embodiments, slightly invasive or non-invasive methods, as described herein, include obtaining plasma, urine, a peritoneal fluid sample, or a cervical swab from a subject.

The term “patient” or “subject,” as used interchangeably herein, refers to any warm-blooded animal, e.g., human or non-human. Non-limiting examples of non-human subjects include mammals, non-human primates, dogs, cats, mice, rats, guinea pigs, rabbits, fowl, pigs, horses, cows, goats, sheep, etc. In certain embodiments, the subject is human.

The term “nucleic acid,” “nucleic acid molecule,” or “polynucleotide” includes any compound and/or substance that comprises a polymer of nucleotides. Each nucleotide is composed of a base, specifically a purine- or pyrimidine base (i.e., cytosine (C), guanine (G), adenine (A), thymine (T), or uracil (U)), a sugar (i.e., deoxyribose or ribose), and a phosphate group. In certain embodiments, the nucleic acid molecule is described by the sequence of bases, whereby said bases represent the primary structure (linear structure) of a nucleic acid molecule. The sequence of bases is typically represented from 5′ to 3′. These terms encompass deoxyribonucleic acid (DNA) including, e.g., complementary DNA (cDNA) and genomic DNA, ribonucleic acid (RNA), in particular messenger RNA (mRNA), synthetic forms of DNA or RNA, and mixed polymers comprising two or more of these molecules. The herein described nucleic acid molecule can contain naturally occurring or non-naturally occurring nucleotides. Examples of non-naturally occurring nucleotides include modified nucleotide bases with derivatized sugars or phosphate backbone linkages or chemically modified residues.

The term “isolated” (e.g., isolated genomic DNA) refers to a biological component that has been substantially separated or purified away from other biological components in the cell of the organism in which the component naturally occurs, e.g., other chromosomal and extra-chromosomal DNA and RNA, proteins, and organelles. Nucleic acids, e.g., DNA, that have been “isolated” include nucleic acids purified by standard purification methods.

The term “genomic locus” or “genomic DNA locus,” as used herein, refers to any fixed position in a genome. For example, a genomic locus can refer to a genomic element, a chromosomal region, a gene, a region of a gene, e.g., an exon or intron, a regulatory region of a gene, e.g., a promoter or enhancer, a CpG site, a CpG island, or a CpG island shore. For example, a genomic locus can include one or more CpG sites, e.g., between about 1 to about 100 CpG sites. In certain embodiments, a genomic locus can be of any particular length, e.g., between about 1 to about 10,000 nucleotides in length.

As used interchangeably herein, “methylation state,” “methylation profile,” “methylation status,” and “methylation level” refer to the presence, absence, percentage, and/or quantity of methylation at a particular nucleotide, or nucleotides, within a DNA region, e.g., a genomic locus. The methylation status of a particular DNA sequence (e.g., a genomic locus) can indicate the methylation state of every nucleotide in the sequence, indicate the methylation state of any of the nucleotides (e.g., cytosines) in the sequence, can indicate the methylation state of a subset of the nucleotides (e.g., of cytosines), can indicate the percentage or fraction of methylated cytosines at any particular stretch of nucleotides within the sequence or can indicate the average rate of methylation of all the cytosines (or a subset of the cytosines) present in a nucleic acid.

As used herein, a “methylated nucleic acid molecule” refers to a nucleic acid molecule that contains one or more methylated nucleotides that is/are methylated.

As used herein, a “CpG site” or “methylation site” is a nucleotide within a nucleic acid that is susceptible to methylation either by natural occurring events in vivo or by an event instituted to chemically methylate the nucleotide in vitro.

A “CpG island,” as used herein, describes a segment of a nucleic acid, e.g., DNA sequence, that have a high frequency of CpG dinucleotide repeats. See, e.g., Illingworth and Bird, FEBS Letters, 2009; 583:1713-1720. For example, Yamada et al. (Genome Research, 2004; 14:247-266) have described a set of standards for determining a CpG island: it must be at least 400 nucleotides in length, has a GC content greater than 50%, and an OCF/ECF ratio greater than 0.6. Others (Takai et al., Proc. Natl. Acad. Sci. U.S.A., 2002; 99:3740-3745) have defined a CpG island less stringently as a sequence of at least 200 nucleotides in length, having a greater than 50% GC content and an OCF/ECF ratio greater than 0.6.

A “CpG island shore,” as used herein, refers to methylation hotspots that are present a short distance, e.g., less than 2 kb, from CpG islands.

The term “methylome,” as used herein, refers to the amount or pattern of methylation at different sites or regions within a population of cells. The methylome can correspond to all of the genome, a subset of the genome (e.g., repeat elements in the genome), or a portion of the subset (e.g., those areas found to be associated with ovarian cancer). A methylome from plasma can be referred to a “plasma fluid methylome,” or a “plasma fluid DNA methylome.” The plasma fluid methylome is an example of a cell-free methylome that includes cell-free DNA (cfDNA).

As used herein, the term “increase” refers to alter positively by at least about 2%, including, but not limited to, alter positively by about 5%, by about 10%, by about 15%, by about 20%, by about 25%, by about 30%, by about 35%, by about 40%, by about 45%, by about 50%, by about 55%, by about 60%, by about 65%, by about 70%, by about 75%, by about 80%, by about 85%, by about 90%, by about 95% or by about 100%.

As used herein, the terms “reduce,” “reduction,” or “decrease” refers to alter negatively by at least about 2%, including, but not limited to, alter negatively by about 5%, by about 10%, by about 15%, by about 20%, by about 25%, by about 30%, by about 35%, by about 40%, by about 45%, by about 50%, by about 55%, by about 60%, by about 65%, by about 70%, by about 75%, by about 80%, by about 85%, by about 90%, by about 95% or by about 100%.

As described herein, this Example provides methods for diagnosing, monitoring, classifying, and/or treating ovarian cancer by analyzing the methylation status of one or more genomic loci in a sample (e.g., a plasma sample, a urine sample, a peritoneal fluid sample, or a cervical swab sample) of a mammal (e.g., a human female). In certain embodiments, the methods can include using an algorithm described herein. In certain embodiments, the methods described herein can allow for the early diagnosis or screening of a subject with ovarian cancer, e.g., the subject does not have any symptoms, or only have early symptoms of ovarian cancer.

In certain embodiments, samples obtained for use in the methods described herein can include cfDNA, which carries DNA methylation information from the cell of origin. cfDNA can arise from cellular apoptosis and necrosis, and can be generated from active secretory processes, with the formation of extracellular vesicles. DNA signatures are highly tissue-specific, and include in vivo information relating to the tissue source of cfDNA. In certain embodiments, the methods described herein can include analyzing cfDNA in a sample (e.g., a plasma sample, a urine sample, a peritoneal fluid sample, or a cervical swab sample), to identify genetic phenotypes that are drivers and/or consequences of ovarian cancer.

The sample from the subject can be collected using any appropriate technique. For example, a blood sample, a plasma sample, a urine sample, a peritoneal fluid sample, or a cervical swab sample can be collected using standard methods. In some cases, the sample can be collected from the subject before the subject has any symptom of ovarian cancer, i.e., a non-symptomatic subject. In certain embodiments, the sample can be collected from the non-symptomatic subject who is at high risk of ovarian cancer. In certain embodiments, the sample can be collected from the subject who has previously received or is currently receiving a treatment for ovarian cancer. In certain embodiments, two or more samples (e.g., two or more, three or more, four or more, five or more, six or more or seven or more samples) can be obtained before and during the subject is receiving a treatment for ovarian cancer (e.g., serially obtained samples).

Diagnostic, Prognostic, Classification, and Monitoring Methods of this Example

This Example provides diagnostic and prognostic methods for diseases and/or disorders that are characterized by differential methylation of genomic loci. For example, this Example provides methods for diagnosing, prognosing, classifying, and/or monitoring ovarian cancer in a subject that includes analyzing the methylation status of certain genomic loci.

In certain embodiments, the analyzed genomic loci can include one or more genomic loci that exhibit differential methylation in a sample from a subject that has ovarian cancer compared to a reference sample. For example, the methods described herein can include assessing the methylation status of one or more genomic loci, e.g., about 5 or more, about 10 or more, about 50 or more, about 100 or more, about 500 or more, about 1,000 or more, about 5,000 or more, about 10,000 or more, about 25,000 or more, about 50,000 or more or about 100,000 or more genomic loci in a sample of a subject. In certain embodiments, the genomic loci can be selected from the genes, or a region within the genes, provided in Table 18 and/or Example Embodiment K. In certain embodiments, the one or more genomic loci can be one or more promoter regions of one or more genes, one or more exons of one or more genes, one or more introns of one or more genes, one or more CpG sites, one or more CpG islands, one or more CpG island shores, one or more enhancers of one or more genes, or a combination thereof. In certain embodiments, the genomic loci are present on a particular chromosome.

In certain embodiments, this Example provides methods for diagnosing, prognosing, and/or monitoring ovarian cancer in a subject by detecting the DNA methylation profiles associated with ovarian cancer. In certain embodiments, the methods described herein can include (a) obtaining a sample from the subject, (b) determining the methylation status of one or more genomic loci present in the sample, e.g., present within cfDNA in a plasma sample, a urine sample, a peritoneal fluid sample, or a cervical swab sample, (c) comparing the methylation status of the one or more genomic loci to a reference or a set of predicted values, and (d) diagnosing ovarian cancer in the subject. In certain embodiments, the difference in the methylation status of the one or more genomic loci in the sample compared to the reference or a set of predicted values indicates the presence of ovarian cancer in the subject. In certain embodiments, the difference in the methylation status also can indicate the severity of ovarian cancer.

In certain embodiments, the methods described herein for diagnosing, prognosing, and/or monitoring ovarian cancer in a subject can include (a) obtaining a sample from the subject, (b) determining the level of methylation of one or more genomic loci present in the sample, (c) comparing the level of methylation of the one or more genomic loci to a reference or a set of predicted values, and (d) diagnosing ovarian cancer in the subject. In certain embodiments, the difference in the level of methylation of the one or more genomic loci in the sample compared to the reference or a set of predicted values indicates the presence of ovarian cancer in the subject. In certain embodiments, the difference in the methylation level also can indicate the severity of ovarian cancer.

In certain embodiments, diagnosing ovarian cancer in the subject can include characterizing a phenotype of the ovarian cancer, wherein the difference in the methylation status of the one or more genomic loci in the sample compared to the reference or a set of predicted values indicates the phenotype of the ovarian cancer. In certain embodiment, the phenotype of the ovarian cancer can include the severity of the ovarian cancer, prognosis of the ovarian cancer, molecular expression profile of the ovarian cancer, responsiveness of the ovarian cancer to certain treatments, or any combinations thereof.

In certain embodiments, the methods described herein for determining if a subject is at risk of developing ovarian cancer in the subject can include (a) obtaining a sample from the subject, (b) determining the level of methylation of one or more genomic loci present in the sample, (c) comparing the level of methylation of the one or more genomic loci to a reference or a set of predicted values, and (d) determining that the subject is at risk of developing ovarian cancer, wherein the difference in the level of methylation of the one or more genomic loci in the sample compared to the reference or a set of predicted values indicates that the subject is at risk.

In certain embodiments, diagnosing, prognosing, and/or monitoring of a subject with ovarian cancer can be based on a higher or lower methylation level of the genomic locus in the sample of the subject relative to the methylation level in a reference sample, e.g., a sample from a subject that does not have ovarian cancer. In certain embodiments, a difference of greater than about 5%, greater than about 10%, greater than about 15%, greater than about 20%, greater than about 25%, greater than about 30%, greater than about 35%, greater than about 40%, greater than about 45%, greater than about 50%, greater than about 55%, greater than about 60%, greater than about 65%, greater than about 70%, greater than about 75%, greater than about 80%, greater than about 85%, greater than about 90% or greater than about 95% in the methylation (e.g., level, percentage and/or fraction) of the one or more genomic loci in a sample obtained from a subject compared to a control can be indicative that the subject has ovarian cancer or is at risk of developing ovarian cancer. In certain embodiments, the difference can be a decrease in methylation (e.g., level, percentage, and/or fraction) of the genomic loci in the sample of the subject. Alternatively, the difference can be an increase in methylation (e.g., level, percentage, and/or fraction) of the genomic loci in the sample of the subject. In certain embodiments, the difference can be a decrease in the methylation of a genomic locus and an increase in the methylation of a different genomic locus in the sample obtained from the subject. In certain embodiments, a decrease in the level of methylation of one or more genomic loci in the sample and the increase in the level of methylation of one or more different genomic loci in the sample can indicate the presence of ovarian cancer.

In certain embodiments, diagnosis of a subject with ovarian cancer can be based on the methylated or unmethylated state of a genomic locus, e.g., a CpG site. In certain embodiments, a genomic locus, e.g., a CpG site, in a sample from a subject diagnosed with ovarian cancer can be methylated and the genomic locus, e.g., the CpG site, in a reference sample can be unmethylated. In certain embodiments, a genomic locus in a sample from a subject diagnosed with ovarian cancer can be unmethylated and the genomic locus in a reference sample can be methylated.

Diagnostic, Prognostic, Classification, and Monitoring Methods Using an Algorithm of this Example

This Example also provides diagnostic and prognostic methods for diseases and/or disorders that are characterized by differential methylation of genomic loci by using an algorithm as described, for example, in Example Embodiment L. For example, this Example provides methods for diagnosing, prognosing, classifying, and/or monitoring ovarian cancer in a subject that includes analyzing the methylation status of certain genomic loci and/or genomic fractions.

Methods for Treating Ovarian Cancer

This Example also provides methods for treating a subject having ovarian cancer. For example, a mammal (e.g., a human female) that was identified as having ovarian cancer as described herein (or identified as being at risk of developing ovarian cancer as described herein) can be administered one or more chemotherapies, one or more immunotherapies, or a combination thereof to treat ovarian cancer. Examples of chemotherapies that can be used as described herein to treat ovarian cancer include, without limitation, carboplatin, cisplatin, paclitaxel, and docetaxel. Examples of immunotherapies that can be used as described herein to treat ovarian cancer include, without limitation, bevacizumab and durvalumab. In some cases, a mammal (e.g., a human female) that was identified as having ovarian cancer as described herein (or identified as being at risk of developing ovarian cancer as described herein) can be treated using a surgical procedure to treat ovarian cancer. Examples of surgical procedures that can be used as described herein to treat ovarian cancer include, without limitation, surgeries to remove one or both ovaries with or without removing the uterus.

In some cases, the information provided by the methods described herein can be used by a clinician or physician in determining the most effective course of treatment (e.g., preventative or therapeutic) for the subject. A course of treatment refers to the measures taken for a patient after the prognosis or the assessment of increased risk for development of ovarian cancer is made. For example, when a subject is identified to have an increased risk of developing ovarian cancer, the physician can determine whether frequent monitoring of DNA methylation changes can be performed as a prophylactic measure. Also, when a subject is diagnosed with ovarian cancer (e.g., based on the presence of a DNA methylation pattern in a sample from a subject), it can be advantageous to follow such detection with a therapeutic treatment.

In some cases, this Example provides methods for assessing the efficacy of a therapeutic or prophylactic therapy for treating ovarian cancer in a subject, comprising determining the methylation status of one or more genomic loci present in a sample obtained from a subject prior to the therapy and determining methylation status of the one or more genomic loci present in a sample obtained from the subject at one or more time points during the therapeutic or prophylactic therapy, wherein the therapy is efficacious for treating ovarian cancer in a subject when there is a change in the presence and/or level of methylation of the one or more genomic loci in the second or subsequent samples, relative to the first sample. In certain embodiments, the first sample is obtained after therapeutic treatment has begun.

In certain embodiments, the methods for monitoring the response in a subject to prophylactic or therapeutic treatment of ovarian cancer can include measuring the methylation status and/or level of one or more genomic loci in a sample of a subject at a first time-point, administering a therapeutic agent, re-measuring the methylation status and/or level of the one or more genomic loci at a second time-point, comparing the results of the first and second measurements and optionally modifying the treatment regimen based on the comparison. In certain embodiments, the first time-point can be prior to an administration of the therapeutic agent, and the second time-point can be after said administration of the therapeutic agent. In certain embodiments, the first time-point can be prior to the administration of the therapeutic agent to the subject for the first time. In certain embodiments, the dose (defined as the quantity of therapeutic agent administered at any one administration) can be increased or decreased in response to the comparison. In certain embodiments, the dosing interval (defined as the time between successive administrations) can be increased or decreased in response to the comparison, including total discontinuation of treatment. In addition, the methods described herein can be used to determine the efficacy of the therapeutic treatment, wherein a change in the methylation status of certain genomic loci present in a sample of a subject can indicate that the therapeutic treatment regimen can be altered, reduced, and/or stopped.

Assays of this Example

This Example also provides assays and/or methods for determining the DNA methylation status and/or level of genomic loci that correlates with the presence, absence, and/or severity of ovarian cancer. In some cases, the assay method can include comparing the methylation status and/or level of genomic loci present in a sample from a subject that has ovarian cancer to the methylation status and/or level of genomic loci in a sample from a healthy subject to determine the methylation pattern, as described above, that correlates with the presence of ovarian cancer. In some cases, the assay methods can include comparing the methylation status and/or level of genomic loci in a sample from a subject that has ovarian cancer at an early stage to the methylation status and/or level of genomic loci in a sample from a subject that has ovarian cancer at a late stage to determine the methylation status and/or level that correlates with the different stages and/or severity of ovarian cancer.

DNA Isolation Techniques of this Example

In certain embodiments, the methods described herein can include isolating nucleic acid from a sample (e.g., a plasma sample, a urine sample, a peritoneal fluid sample, or a cervical swab sample) obtained from a subject. Any appropriate technique can be used to isolate nucleic acids from a sample. For example, isolation of DNA from a plasma sample can be performed by extraction methods using organic solvents such as a mixture of phenol and chloroform, followed by precipitation with ethanol (see, for example, J. Sambrook et al., “Molecular Cloning: A Laboratory Manual,” 1989, 2nd Ed., Cold Spring Harbor Laboratory Press: New York, N.Y.). Additional non-limiting examples include salting out DNA extraction (see, for example, P. Sunnucks et al., Genetics, 1996, 144:747-756; and S. M. Aljanabi and I. Martinez, Nucl. Acids Res. 1997, 25:4692-4693), the trimethylammonium bromide salts DNA extraction method (see, for example, S. Gustincich et al., BioTechniques, 1991, 11:298-302), and the guanidinium thiocyanate DNA extraction method (see, for example, J. B. W. Hammond et al., Biochemistry, 1996, 240:298-300). There are also numerous commercially available kits that can be used to extract DNA from biological fluids (e.g., plasma samples) or cells. For example, Qiagen's Gentra PureGene Cell Kit, QlAamp Circulating Nucleic Acid Kit, QiaAmp DNA Mini Kit, DNeasy Blood and Tissue Kit or QiaAmp DNA Blood Mini Kit (Qiagen, Hilden, Germany), GenomicPrep™ Blood DNA Isolation Kit (Promega, Madison, Wis.) and GFX™ Genomic Blood DNA Purification Kit (Amersham, Piscataway, N.J.) can be used to obtain DNA from a sample from a subject.

Methylation Detection Techniques of this Example

Various methylation analysis procedures are known in the art, and can be used with the methods described herein. These assays allow for determination of the methylation state of one genomic locus, e.g., one or more CpG sites or islands within a nucleic acid obtained from a sample. In addition, the methods can be used to quantify the methylation of a genomic locus. Such assays involve, among other techniques, DNA sequencing of bisulfite-treated DNA, PCR (for sequence-specific amplification), digital PCR and use of methylation-sensitive restriction enzymes.

In certain embodiments, methylation-specific PCR can be used to determine the methylation status of a genomic loci. Methylation-specific PCR is based on a chemical reaction of sodium bisulfite with DNA that converts unmethylated cytosines, e.g., of CpG dinucleotides, to uracil or UpG, followed by traditional PCR. Methylated cytosines will not be converted in this process, and primers can be designed to overlap the methylation site, e.g., CpG site, of interest, thereby allowing one to determine the methylation status of the methylation site as methylated or unmethylated. Additionally, restriction enzyme digestion of PCR products amplified from bisulfite-converted DNA may be used, e.g., by using the method described by Sadri & Hornsby (Nucl. Acids Res. 1996; 24:5058-5059) or COBRA (Combined Bisulfite Restriction Analysis) (Xiong & Laird, Nucleic Acids Res. 1997; 25:2532-2534).

In certain embodiments, whole genome bisulfite sequencing, which is a high-throughput genome-wide analysis of DNA methylation, can be used to determine the methylation status of multiple genomic loci. It is based on sodium bisulfite conversion of genomic DNA, as described above, which is then sequenced on a next-generation sequencing platform. The sequences obtained are then re-aligned to the reference genome to determine the methylation states of cytosines, e.g., of CpG dinucleotides, present within the analyzed genomic loci based on mismatches resulting from the conversion of unmethylated cytosines into uracil.

In certain embodiments, genome-wide DNA methylation profiling can be performed using commercially-available arrays, thereby allowing the interrogation of multiple genomic loci, e.g., multiple CpG sites. Non-limiting examples of such arrays include HumanMethylation BeadChips (Illumina, San Diego, Calif.) and Infinium MethylationEPIC kit (Illumina). Additional methods for analyzing the methylation state of multiple genomic loci are provided in Yong et al., Epigenetics & Chromatin 2016; 9:26, which is incorporated by reference herein.

Kits of this Example

This Example provides kits for diagnosing, monitoring, classifying, and/or treating a subject with ovarian cancer. The kits described herein can comprise a means for determining and/or detecting the methylation status of one or more genomic loci.

Kits of this Example can include, without limitation, packaged probe and primer sets (e.g., TaqMan probe/primer sets), arrays/microarrays, which further contain one or more probes, primers, or other detection reagents for determining the methylation state and/or level of one or more genomic loci. For example, a kit described herein can include one or more probes or primers for detecting the methylation state of one or more genomic loci. In certain embodiments, the one or more genomic loci comprise a CpG site. In certain embodiments, one or more of the genomic loci do not comprise a CpG site. For example, about 5% or more, about 10% or more, about 15% or more, about 20% or more, about 25% or more, about 30% or more, about 35% or more, about 40% or more, about 45% or more, about 50% or more, about 55% or more, about 60% or more, about 65% or more or about 70% or more of the one or more genomic loci detected by the primers or probes can comprise one or more CpG sites.

In certain non-limiting embodiments, a primer and/or probe described herein can be at least about 10 nucleotides or at least about 15 nucleotides or at least about 20 nucleotides in length, and/or up to about 200 nucleotides or up to about 150 nucleotides or up to about 100 nucleotides or up to about 75 nucleotides or up to about 50 nucleotides in length.

In a further non-limiting embodiment, the oligonucleotide primers and/or probes can be immobilized on a solid surface or support, for example, on a nucleic acid microarray, wherein the position of each oligonucleotide primer and/or probe bound to the solid surface or support is known and identifiable.

In certain non-limiting embodiments, the kits described herein can additionally include other components such as a buffer, enzymes such as DNA polymerases or ligases, nucleotides such as deoxynucleotide triphosphates, positive control sequences, and/or negative control sequences necessary to carry out an assay or reaction to detect the methylation state of a genomic locus.

In certain embodiments, the kits described herein can include a container comprising one or more probes and/or primers for detecting the methylation state of one or more genomic loci. In certain embodiments, the kits further include instructions for use, e.g., the instructions can describe that a particular methylation status of a genomic locus is indicative of ovarian cancer in a subject. The instructions can be printed directly on the container (when present), or as a label applied to the container, or as a separate sheet, pamphlet, card or folder supplied in or with the container.

Reports, Programmed Computers, and Systems of this Example

In certain embodiments, a diagnosis and/or monitoring of ovarian cancer in a subject based on the methylation status of one or more genomic loci as described herein can be referred to herein as a “report.” A tangible report can optionally be generated as part of a testing process (which can be interchangeably referred to herein as “reporting,” or as “providing” a report, “producing” a report or “generating” a report).

Examples of tangible reports can include, without limitation, reports in paper (such as computer-generated printouts of test results) or equivalent formats and reports stored on computer readable medium (such as a CD, USB flash drive or other removable storage device, computer hard drive, or computer network server, etc.). Reports, particularly those stored on computer readable medium, can be part of a database, which can optionally be accessible via the internet (such as a database of patient records or genetic information stored on a computer network server, which can be a “secure database” that has security features that limit access to the report, such as to allow only the patient and the patient's medical practitioners to view the report while preventing other unauthorized individuals from viewing the report, for example). In addition to, or as an alternative to, generating a tangible report, reports can also be displayed on a computer screen (or the display of another electronic device or instrument).

A report can include, for example, an individual's medical history, or can just include size, presence, absence, or levels of one or more markers (for example, a report on computer readable medium such as a network server can include hyperlink(s) to one or more journal publications or websites that describe the medical/biological implications). Thus, for example, the report can include information of medical/biological significance as well as optionally also including information regarding the methylation status of relevant genomic loci, or the report can just include information regarding the methylation status of relevant genomic loci without other medical/biological significance.

A report can further be “transmitted” or “communicated” (these terms can be used herein interchangeably), such as to the individual who was tested, a medical practitioner (e.g., a doctor, nurse, clinical laboratory practitioner, genetic counselor, etc.), a healthcare organization, a clinical laboratory, and/or any other party or requester intended to view or possess the report. The act of “transmitting” or “communicating” a report can be by any means known in the art, based on the format of the report. Furthermore, “transmitting” or “communicating” a report can include delivering a report (“pushing”) and/or retrieving (“pulling”) a report. For example, reports can be transmitted/communicated by various means, including being physically transferred between parties (such as for reports in paper format) such as by being physically delivered from one party to another, or by being transmitted electronically or in signal form (e.g., via e-mail or over the internet, by facsimile, and/or by any wired or wireless communication methods known in the art) such as by being retrieved from a database stored on a computer network server, etc.

In certain embodiments, the disclosed subject matter provides computers (or other apparatus/devices such as biomedical devices or laboratory instrumentation) programmed to carry out the methods described herein, e.g., to perform the algorithm of this Example (see Example Embodiment L). In certain embodiments, the system can be controlled by the individual and/or their medical practitioner in that the individual and/or their medical practitioner requests the test, receives the test results back and (optionally) acts on the test results to reduce the individual's ovarian cancer risk or treat the individual, such as by implementing an ovarian cancer management system.

Example Embodiment K of Example 6: Non-Invasive Molecular Phenotyping of Human Plasma for Ovarian Cancer

An epigenomic analysis was performed using liquid biopsy of plasma DNA samples obtained from human ovarian cancer patients (n=9) and human healthy controls (n=11). Bisulfite-converted DNA libraries underwent targeted capture using the SeqCap Epi CpGiant Enrichment Kit (Roche, Pleasanton, Calif.) and were sequenced to a read depth of ˜50×. This approach targets 80.5 Mb of the human genome and ˜5.5 million individual CpG sites. The Bismark Bisulfite Read Mapper 34 was used to align the libraries against the GRCh38 reference genome and determine the methylation status for each CpG site.

These analyses demonstrated the ability to generate high resolution epigenomic liquid biopsy data in cell free DNA samples obtained from plasma. Examples of ovarian cancer-specific differentially methylated loci are shown in Table 18. The list of CpG sites represents examples of specific DNA methylation differences that may be detected in the relevant biological samples listed herein, demonstrating that DNA methylation differences such as those listed and many others exist and can be used to differentiate samples as described herein, for example, as described using procedures similar to those set forth in Example Embodiment L.

TABLE 18 Examples of differentially methylated CpG sites between normal plasma and ovarian cancer plasma. % Methylation Difference Chromosome Coordinate pvalue qvalue (Case-Control) chr1 3366144 2.64E−06 0.066643 −13.7369 chr1 3366307 2.84E−06 0.069386 −10.1065 chr1 6239344 4.43E−06 0.090961 5.92168 chr1 15048599 3.50E−06 0.076076 5.372909 chr1 110675458 2.43E−06 0.063373 −13.7349 chr1 110675513 1.29E−06 0.039991 −16.7303 chr1 110675581 2.10E−07 0.013145 −18.3871 chr1 110675587 2.28E−07 0.013199 −18.4758 chr1 110675617 5.25E−07 0.022788 −19.1298 chr1 125182646 2.22E−06 0.060231 −7.31516 chr1 125182662 1.30E−08 0.001258 −7.57844 chr1 125182668 3.62E−08 0.002919 −7.66042 chr1 125182688 9.75E−07 0.035115 −7.94721 chr1 143213094 3.34E−06 0.074525 −8.37223 chr1 143238114 4.78E−06 0.095 −8.73461 chr1 143264564 4.80E−06 0.095 −6.5326 chr1 227969662 1.05E−06 0.036894 −9.434 chr10 10266910 9.63E−11 2.33E−05 −21.8125 chr10 10266955 2.33E−08 0.002033 −19.4378 chr10 10267015 4.32E−06 0.089071 −13.0664 chr10 42079914 1.56E−06 0.045245 −7.42861 chr10 48465484 1.13E−06 0.037592 −12.8601 chr10 95184074 1.11E−07 0.007959 −13.0014 chr10 95184082 6.62E−07 0.026342 −13.3685 chr10 95184085 1.48E−07 0.009912 −13.4852 chr10 95184109 6.44E−07 0.026245 −14.1042 chr10 95184115 1.08E−06 0.037293 −14.1683 chr10 95184166 2.91E−06 0.069394 −13.8895 chr10 125895681 4.20E−13 2.03E−07 −17.2557 chr10 125895682 9.72E−07 0.035115 −17.1847 chr10 125895696 5.64E−17 6.36E−11 −15.4744 chr10 125895697 4.86E−17 6.36E−11 −15.3924 chr10 125895708 2.36E−18 7.99E−12 −14.9818 chr10 125895713 3.07E−13 1.73E−07 −14.0875 chr10 125895714 6.18E−09 0.000675 −14.0002 chr10 125895715 6.78E−14 4.59E−08 −13.9126 chr10 125895716 9.99E−13 3.76E−07 −13.8246 chr10 125895747 2.88E−06 0.069394 −12.216 chr11 68209486 2.29E−07 0.013199 −5.62643 chr12 5039987 2.60E−06 0.06663 −15.3903 chr12 5039991 3.79E−06 0.080237 −15.4107 chr12 5666506 3.02E−06 0.070988 −17.206 chr12 6215837 2.02E−06 0.056581 −17.01 chr12 40693719 3.37E−06 0.074525 −15.8434 chr12 40693738 1.53E−06 0.044858 −16.4129 chr12 40693739 4.27E−08 0.003364 −16.4424 chr12 40693782 5.05E−06 0.099323 −16.6923 chr12 132394139 5.25E−07 0.022788 −12.3098 chr12 132394165 3.36E−06 0.074525 −12.4282 chr12 132394178 2.81E−07 0.013996 −12.107 chr12 132394181 5.80E−07 0.024225 −12.062 chr12 132394190 5.73E−07 0.024225 −11.8616 chr12 132394229 3.18E−08 0.002629 −11.3937 chr14 32938088 1.29E−06 0.039991 −14.9034 chr14 64442345 1.96E−06 0.055266 −13.0625 chr15 52256671 3.33E−06 0.074525 −23.4156 chr16 12390340 1.87E−07 0.012171 −15.1675 chr16 34576408 4.73E−06 0.095 −6.18683 chr16 34576478 3.29E−07 0.01592 −5.61188 chr16 34582466 1.44E−06 0.043574 2.624713 chr16 34595076 1.46E−06 0.043794 −8.51925 chr16 34595124 7.80E−08 0.005739 −8.47079 chr16 46387382 1.16E−06 0.037643 −5.40415 chr16 46389677 2.91E−06 0.069394 −6.99576 chr16 46400320 3.79E−07 0.017592 −8.23138 chr16 46400330 6.60E−07 0.026342 −8.01991 chr17 18625364 3.72E−06 0.079288 −21.5684 chr17 20320695 3.01E−06 0.070988 −6.81262 chr17 20320736 3.32E−06 0.074525 −6.37515 chr17 26885700 1.09E−08 0.001083 15.43918 chr17 78921813 2.64E−06 0.066643 −10.1664 chr18 9102789 1.05E−06 0.036894 −12.8311 chr18 9102821 5.02E−08 0.003863 −12.1908 chr18 9102824 2.82E−08 0.002387 −12.1118 chr18 9102830 5.81E−09 0.000675 −11.9481 chr18 9102833 5.30E−09 0.000644 −11.853 chr18 9102851 1.11E−06 0.037345 −11.2436 chr18 31724111 7.16E−08 0.005389 −12.0506 chr18 31724113 2.62E−07 0.013662 −12.0828 chr18 31724118 1.15E−06 0.037643 −12.1459 chr18 31724148 7.49E−07 0.028817 −12.3979 chr18 31724187 2.85E−06 0.069386 −12.5095 chr18 31724276 2.68E−07 0.013746 −12.2512 chr18 311724334 4.78E−06 0.095 −12.4003 chr18 31724335 5.33E−09 0.000644 −12.3964 chr18 31724365 2.30E−07 0.013199 −12.2822 chr18 31724433 2.30E−06 0.060665 −11.7912 chr18 31724435 1.19E−06 0.037839 −11.7942 chr18 31724440 1.17E−06 0.037643 −11.7475 chr18 31724443 4.18E−06 0.086773 −11.7545 chr18 31724529 4.87E−07 0.021667 −12.3839 chr18 31724541 2.16E−06 0.058943 −12.531 chr18 31724583 3.87E−07 0.017721 −13.5801 chr18 31724606 2.80E−07 0.013996 −14.2326 chr18 31724622 2.41E−07 0.01339 −14.6741 chr19 16719929 2.29E−06 0.060665 −20.5149 chr19 16719994 2.31E−06 0.060665 −18.6129 chr19 52517362 1.24E−07 0.008539 −25.6849 chr2 48532708 3.54E−07 0.016656 −16.3139 chr2 128901652 7.77E−07 0.029333 −20.0871 chr2 128901658 2.75E−06 0.067887 −20.4302 chr2 128901702 1.61E−06 0.046109 −21.9965 chr2 128901778 4.36E−07 0.019674 −23.3685 chr2 128901800 2.49E−07 0.013611 −23.8407 chr2 128901810 1.44E−06 0.043574 −23.913 chr2 128901821 3.51E−06 0.076076 −24.1044 chr2 128901846 3.05E−06 0.071291 −24.2352 chr2 188034367 2.09E−06 0.058014 −21.0074 chr22 22297772 2.17E−09 0.000366 −13.8252 chr22 22297775 3.24E−09 0.000476 −14.1664 chr22 22297780 2.06E−08 0.001941 −14.7199 chr22 22297786 4.69E−09 0.000611 −15.3596 chr22 22297801 6.08E−09 0.000675 −14.9302 chr22 22297836 6.56E−13 2.78E−07 −19.0675 chr22 22297844 2.60E−14 2.20E−08 −19.3494 chr22 22297852 2.24E−11 6.88E−06 −18.6188 chr22 22297861 1.19E−11 4.02E−06 −18.1964 chr22 22297866 3.36E−10 7.10E−05 −16.991 chr22 22297870 4.69E−09 0.000611 −16.5603 chr22 22297874 5.61E−10 0.000112 −16.3023 chr3 12542803 1.18E−07 0.008329 −12.4785 chr3 49358049 3.19E−06 0.073473 −8.95098 chr3 49358057 7.50E−11 1.95E−05 −9.26347 chr3 49358073 3.59E−09 0.000506 −9.58429 chr3 49358074 7.49E−10 0.000141 −9.60497 chr3 49358080 7.12E−09 0.000753 −9.55828 chr3 49358081 9.92E−09 0.001018 −9.57239 chr3 49358086 2.60E−07 0.013662 −9.52519 chr3 49358088 3.45E−07 0.01646 −9.53955 chr3 49358089 3.05E−09 0.00047 −9.54489 chr3 49358098 2.03E−07 0.012966 −9.56184 chr3 49358099 1.48E−09 0.000263 −9.55482 chr3 49358118 2.34E−08 0.002033 −9.38281 chr3 49358119 2.77E−10 6.25E−05 −9.35371 chr3 49358123 1.23E−06 0.038963 −9.41791 chr3 49358124 2.79E−11 7.88E−06 −9.38576 chr3 49689148 2.30E−08 0.002033 −8.63847 chr3 49689161 2.56E−09 0.000413 −8.14538 chr3 49689191 6.20E−07 0.025594 −6.60995 chr3 52317368 3.13E−06 0.072601 −12.8742 chr4 2430180 1.30E−06 0.039991 −8.00714 chr4 8254522 2.98E−07 0.014615 −7.09407 chr4 20003976 9.39E−07 0.034543 −16.5665 chr4 52051137 2.23E−07 0.013199 −9.05182 chr4 140755696 7.90E−07 0.029375 −4.19519 chr4 151409307 3.89E−06 0.081301 −6.26016 chr4 151409311 2.51E−06 0.064797 −6.01374 chr4 151409314 3.36E−06 0.074525 −6.15508 chr4 151409322 1.51E−06 0.044837 −6.35103 chr4 151409329 4.75E−06 0.095 −6.48028 chr4 151409364 2.11E−06 0.058014 −6.9842 chr4 169030994 4.79E−06 0.095 −13.0722 chr5 117304877 1.49E−07 0.009912 −7.84561 chr5 117304880 7.31E−07 0.028432 −7.83696 chr5 117304901 2.56E−07 0.013662 −7.54501 chr5 117304913 2.41E−07 0.01339 −7.31788 chr6 14088969 2.31E−06 0.060665 −28.6201 chr6 112366877 6.99E−07 0.027505 −13.3154 chr6 155449253 2.68E−06 0.067096 −11.4461 chr6 162728535 1.54E−06 0.044858 −15.2105 chr6 167672244 1.72E−06 0.048802 3.420434 chr7 752348 3.61E−06 0.077419 −4.15708 chr7 752349 1.10E−06 0.037345 −4.13669 chr7 1870636 1.11E−06 0.037345 4.898359 chr7 73271542 3.47E−06 0.076076 −13.4436 chr7 74480244 3.86E−06 0.08125 −12.3172 chr7 158959750 1.06E−06 0.036894 5.219832 chr8 69711200 7.80E−07 0.029333 −16.5349 chr8 69711209 2.14E−07 0.013192 −15.9125 chr8 109688955 2.73E−06 0.067887 −24.1368 chr8 131038983 5.73E−07 0.024225 −11.9616 chr9 106038008 3.59E−06 0.077419 10.21198

Methylation density plots of data are shown in FIG. 47. These represent a high level view of the data, mapped across various genomic elements. The results confirm a consistent degree of hypomethylation in plasma from ovarian cancer patients. This likely reflects the generally hypomethylated state of the tumors.

Example Embodiment L of Example 6: Estimation of Abnormal Plasma Methylome Variation in Targeted Regions for Diagnosis of Ovarian Cancer

Provided below is an algorithm that can be used to diagnose a subject with ovarian cancer. The presently disclosed subject matter provides that the methylome(s) of ovaries of a mammal, or structures therein, could be affected by certain abnormalities (e.g., ovarian cancer), and that the changes of these methylomes can lead to changes in the methylation patterns of the DNA fragments found in plasma, which are released by ovary tissues. An algorithm was developed to identify the changes of methylation patterns in the methylome of plasma caused by ovary phenotypes. One insight behind this algorithm was that the methylome of the DNA fragments in plasma is a mixture of a variety of component methylomes of ovarian and other origins, and that the proportion of these different component methylomes in the mixture varies from subject to subject, even among the population with normal ovary phenotype. By constructing a model of plasma methylome as a linear combination of various component methylomes of ovary and other origins, the algorithm can accurately predict the methylation patterns of a new plasma sample under the hypothesis that it is from a normal individual. Consequently, the algorithm has high sensitivity for detecting abnormal methylation patterns in a plasma sample caused by changes of the methylomes of some ovarian/fallopian or other relevant tissue when the sample is from an affected individual.

The procedure can be applied with little modification to the diagnosis and phenotyping of ovarian cancer using other types of biopsy samples, such as cervical swabs, urine and peritoneal fluid, provided that the DNA fragments from the tissues affected by ovarian cancer can be found in those biopsy samples.

Let i be any CpG site in human genome, z_(i,j) be the methylation level of CpG site i in a plasma sample j, p_(i,r,j) be the proportion of the r^(th) component methylome m_(r,j) of ovarian/fallopian or other relevant tissue origin in plasma sample j at site i, m_(i,r,j) be the methylation level of CpG i in methylome m_(r,j). The hypothesis is:

Z^(i,j)=Σ_(r=1) ^(R)p_(i,r,j)m_(i,r,j)  (1)

where p_(i,r,j), m_(i,r,j)>=0, m_(i,r,j)<=1, p_(i,1,j)+ . . . p_(i,R,j)=1.

It is further assumed that there is a set of CpG sites S such that, for any CpG site i in S, and any plasma j from a normal individual, it has m_(I,r,j)=m_(I,r) and p_(I,r,j)=p_(r,j).

That is, it is assumed that in any plasma sample from a normal individual, the proportions of different component methylomes in the mixture are the same for all CpG sites in S. It is also assumed that, by restricting to the set of CpG sites S, plasma samples from all normal individuals have the same set of component methylomes. They are called restricted reference component methylomes (RRCM), and are labeled as m₁ ^(S), . . . , m_(R) ^(S) or simply m₁, . . . , m_(R) when there is no confusion. For any plasma sample j from a normal individual, its methylome restricted to set of CpG sites in S can be expressed as a weighted average of the restricted reference component methylomes. More precisely, let z_(j) ^(S) be the methylome of plasma sample C restricted to S, then for some mixture vector p_(j)=[p_(j,l) . . . , p_(j,R)]^(T), it has:

z_(j) ^(s)=[m₁ ^(S), . . . m_(R) ^(S)]p_(j)  (2)

Finally, it is assumed that the set S is the union of two disjoint subsets C and T, where T is a union of K non-empty sets T_(k) such that T=U_(k=1) ^(K)T_(k) where the index k represents the k^(th) type of abnormal tissue phenotype. T_(k)'s do not need to be disjoint. Moreover, T_(k) itself is the union of two disjoint sets D_(k) and V_(k). Either D_(k) or V_(k) could be empty, but not both. It is assumed that for any plasma sample, including one from an abnormal individual, when restricted to CpG sites in C, its methylome can always be expressed as a weighted average of the restricted reference component methylomes. That is, it has: z_(j) ^(C)=[m₁ ^(C), . . . , m_(R) ^(C)]p_(j) regardless whether j is from an abnormal individual. C is called the set of reference CpG sites. On the other hand, for a plasma sample l from an abnormal individual, when restricted to CpG sites in S=CUT, its methylome can no longer be expressed as a weighted average of the restricted reference component methylomes. That is, it has: w₁ ^(S)≠[m₁ ^(S), . . . , m_(R) ^(S)]p_(l) for any mixture vector p_(l). More specifically, for a plasma sample l from an individual with the k^(th) type of abnormal phenotype, it has: 1), w_(j) ^(C)=[m₁ ^(C), . . . , m_(R) ^(C)], 2), if D_(K) is non-empty w_(l) _(D) _(K)=[m_(1,k) ^(D) ^(k) , . . . , m_(R,k) ^(D) ^(k) ]p_(l) such that [m₁ ^(D) ^(k) , . . . , m_(R) ^(D) ^(k) ]≠[m_(1,k) ^(D) ^(k) , . . . , m_(R,k) ^(D) ^(k) ], and 3), if V_(k) is non-empty, then w_(l) ^(V) ^(k) =m₁ ^(V) ^(k) , . . . , m_(R) ^(V) ^(k) υq_(l) such that p_(l)≠q_(l). In other words, in a plasma sample from the k^(th) type of abnormal individual, if the set D_(k) is not empty, the component methylomes of the sample l restricted to D_(k) are no longer the same as the reference component methylome restricted to D_(k). If the set V_(k) is not empty, in this plasma sample, the proportion of the reference component methylomes restricted to V_(k) is no longer the same as the proportion of the reference component methylome restricted to R.

T is called the target set of CpG sites, D_(k) is called the differential methylation target set, V_(k) is called the copy number variation target set, and T_(k) is called the target set for the k^(th) type of abnormal phenotype.

The main steps of the algorithm of this Example are:

-   -   1) Identify the sets of reference CpG sites C, and T₁, . . . ,         T_(K) for the list of K types of abnormal individuals.     -   2) Estimate the restricted reference component methylomes m₁, .         . . , m_(R), or R predictor methylomes n₁, . . . , n_(R) that         are independent linear combinations of the reference component         methylomes such that n_(r)=[m₁, . . . , m_(R)]q_(r) for R         linearly independent mixture vectors q₁, . . . , q_(R).     -   3) (Optional) If the reference component methylomes are         available, estimate the proportions of these components at the         reference CpG sites C for the test plasma samples.     -   4) Predict the methylation level of the test plasma samples at         the target set T_(k) of CpG sites, under the hypothesis that the         sample is from a normal individual.     -   5) Compare the predicted methylation levels at D_(k) and V_(k)         against the observed methylation levels, and reject the null         hypothesis that a test sample is from a normal individual if the         observed methylation levels are significantly different form the         predicted levels.

The algorithm of this Example can be implemented in a variety of ways. For example, given the methyl-seq data for a set of plasma samples from normal individuals, the presently disclosed EM algorithm or the data augmentation method can be applied to estimate the component methylomes, then use the maximum likelihood method to estimate the proportion of these component methylomes in the test sample. Below are exemplary simple implementations of the presently disclosed algorithm that use linear regression.

In the simplest implementation of the algorithm of this Example, it is assumed the restricted methylome of a plasma sample from a normal individual can be approximated by a mixture of two restricted reference methylomes. It is further assumed that the estimations of these two reference component methylomes are available. For example, in the implementation below, for the genomic loci of interest, the plasma methylome is approximated by the mixture of leukocyte and ovarian/fallopian or other relevant tissue methylomes. The implementation of the algorithm includes the following steps:

1. Identify the Reference Set C, and the Target Sets T₁, . . . , T_(K).

-   -   1.1 Collect the methylation data for a set of leukocyte samples,         a set of ovarian/fallopian or other relevant tissue/cell         samples, and a set of plasma samples, all from normal         individuals. For each type of abnormal individuals, collect a         set of leukocyte-derived samples, a set of ovarian/fallopian or         other relevant tissue/cell samples, and a set of plasma samples         from that type of abnormal individuals. All these samples should         have matched age, race, and other relevant parameters. These are         the training data.     -   1.2 Let x_(i,j) be the observed methylation level of CpG site i         in a normal leukocyte-derived sample j, and y_(i,l) the observed         methylation level of CpG site i in a normal ovarian/fallopian or         other relevant tissue/cell sample l, s_(x,i) ² the sample         variance of x_(i) over all normal leukocyte-derived samples,         s_(y,i) ² the sample variance of y_(i,j) over all normal         ovarian/fallopian or other relevant tissue/cell samples.         Identify the CpG sites S₀ such that for any i∈S₀, it has both         s_(x,i) ²<c₀ and s_(y,i) ²<c₀ for some constant c₀. These are         CpG sites with stable methylation levels in each type of normal         cells.     -   1.3 Let x_(i,j) be the observed methylation level of CpG site i         in a leukocyte-derived sample j, including normal and abnormal,         and y_(i,l) the observed methylation level of CpG site i in a         ovarian/fallopian or other relevant tissue/cell sample l,         including normal and abnormal, s_(x,i) ² the sample variance of         x_(i,l) over all leukocyte-derived samples, including normal and         abnormal, s_(y,i) ² the sample variance of y_(i,j) over all         ovarian/fallopian or other relevant tissue/cell samples,         including normal and abnormal. Identify the CpG sites S₁ such         that for any i∈S₁, it has both s_(x,i) ²<c₀ and s_(y,i) ²<c₀ for         some constant c₀, and that the statistical test for the         difference between {x_(i,j0): j0 is a normal leukocyte—derived         sample} and {x_(i,jk): jk is an abnormal leukocyte—derived         sample of type k} is not significant for all abnormal types of         leukocyte-derived, and that the statistical test for the         difference between {y_(i,j0): j0 is a normal ovarian/fallopian         or other relevant tissue/cell sample} and {y_(i,jk): jk is an         abnormal ovarian/fallopian or other relevant tissue/cell sample         of type k} is not significant for all abnormal types of         ovarian/fallopian or other relevant tissue/cell. These are CpG         sites with stable methylation levels in each type of cells, and         with no difference in methylation level between normal and any         abnormal samples. Let x_(i) be the sample mean of x_(i,j) over         all leukocyte-derived samples, including normal and abnormal,         y_(i) the sample mean of y_(i,j) over all ovarian/fallopian or         other relevant/cell samples, including normal and abnormal.         Identify the subset C₀ of S₁ such that for any i∈C₀, it has         |x_(i)−y_(i)|>c₁ for some constant c₁. These are CpG sites that         are stably methylated in each cell type, with no difference         between the normal and abnormal samples of the same cell type,         and differentially methylated between different types of cells.     -   1.4 Let x^(R) ⁰ be the vector of x_(i) for all i∈C₀, and y^(C) ⁰         be the vector of y_(i) for all i∈C₀, where x_(i) is the mean         methylation at site i in all leukocyte-derived samples y_(i) the         mean methylation at site i in all ovarian/fallopian or other         relevant tissue/cell samples. Note that by the way the set C₀ is         selected, there is no difference in the methylation level of any         CpG sites in C₀ between normal and abnormal leukocyte-derived         samples, or between normal and abnormal ovarian/fallopian or         other relevant tissue/cell samples. Let z_(j) ^(C) ⁰ be the         observed methylation levels of CpG sites in C₀ for a plasma         sample j of the k^(th) abnormal type. (For convenience, the         normal plasma sample is called as sample of the 0th abnormal         type). For each sample j belonging to the k^(th) abnormal type,         regress z_(j) ^(C) ⁰ against x^(C) ⁰ and y^(C) ⁰ , with the         constraints that the intercept must be 0, and the coefficients         must be non-negative and add to 1, and get the residual e_(j)         ^(C) ⁰ . Identify the subset C₀ ^(k) of C₀ such that for any CpG         i in C₀ ^(k), it has

${\frac{e_{i,k}^{2}}{s_{i,k}} < c_{2}},$

and e_(i,k) ²<c₃ for some constants c₂ and c₃, where e_(i,k) ² is the mean of the squared difference between estimated and observed methylation levels of CpG site i in all plasma samples of the k^(th) abnormal type, and s_(i,k) ² the sample variances of methylation levels of CpG site i in the same set of plasma samples. Repeat the above procedure for each type of abnormal plasma samples, the intersection of the subsets C=∩_(k=0) ^(K)C₀ ^(k) is the reference set of CpG sites. These are CpG sites where their methylation levels in both normal and any type of abnormal plasma samples can be accurately predicted by the reference component methylomes from normal individuals.

-   -   1.5 Let T₀=S₀\ S₁. Let x^(C) and x^(T) ⁰ be the vectors of x_(i)         and x_(h) for all i∈C and h∈T₀ respectively, and y^(C) and y^(T)         ⁰ be the vectors of y_(i) and y_(h) for all i∈C and h∈T₀         respectively, where x_(i), x_(h), y_(i), and y_(h) are mean         methylation level of sites for a normal leukocyte-derived or         ovarian/fallopian or other relevant tissue/cell at sites i and h         respectively. Let z_(j) ^(C) and z_(j) ^(T) ⁰ and be the         observed methylation levels of CpG sites in C and T₀         respectively for a normal plasma sample j, w_(l) _(k) ^(C) and         w_(l) _(k) ^(T) ⁰ the observed methylation level of CpG sites in         C and T₀ respectively for a plasma sample l_(k) from an         individual with the k^(th) type of abnormality, w_(l) _(g) ^(C)         and w_(l) _(k) ^(T) ⁰ the observed methylation level of CpG         sites in C and T₀ respectively for a plasma sample l_(g) from an         individual with the g^(th) type of abnormality, where g≠k. For         each j, l_(k), and l_(g), regress z_(j) ^(C), w_(l) _(k) ^(C),         and w_(l) _(g) ^(C) respectively against x^(C) and y^(C), with         the constraints that the intercept must be 0, and the         coefficients must be non-negative and add to 1. Apply the fitted         models respectively to x^(T) ⁰ and y^(T) ⁰ to predict z_(j) ^(T)         ⁰ , w_(l) _(k) ^(T) ⁰ and W_(l) _(g) ^(T) ⁰ respectively, and         get the differences e_(j) ^(T) ⁰ , e_(l) _(k) ^(T) ⁰ and e_(l)         _(g) ^(T) ⁰ between the predicted values and observed values.         Let e_(i), e_(i,k), and e_(i,g) be the means of the sets of         differences {e_(j) ^(T) ⁰ : j is a normal plasma sample}, {e_(l)         _(k) ^(T) ⁰ : l_(k) is a plasma sample of th k^(th) abnormal         type} and {e_(l) _(g) ^(T) ⁰ : l_(g) is a plasma sample of the         g^(th) abnormal type} for CpG site i respectively. Identify the         subset T_(k) of T₀ such that for any i∈T_(k), it has led         <c_(2,0), |e_(i)|<c_(2,0), |e_(i,k)|>c_(2,k), and         |e_(i,k)−e_(i,g)|>c_(3,k), for some constants c_(2,0), c_(2,k),         and c_(3,k), for all g≠k. T_(k) is the target set for the k^(th)         type of the abnormal individual. These are the sites where the         methylation of a normal plasma sample can be accurately         predicted, the observed methylation in a plasma sample of the         k^(th) abnormal type will deviate from the prediction, and         deviation will be different from that of a plasma sample of any         other abnormal type.

2. Estimate Fraction of the New Plasma Samples to be Tested

Recall that x^(c) and y^(c) are mean vectors of the methylation levels of the training leukocyte-derived and training ovarian/fallopian or other relevant tissue/cell data for the CpG sites in the reference set C. For any new plasma sample t to be tested, let z_(t) ^(C) be the observed methylation levels of CpG sites in C. Regress z_(t) ^(C) against x^(C) and y^(C), with the constraints that the intercept must be 0, and the coefficients must be non-negative and add to 1. The estimated coefficients are the estimated fractions of the component methylomes for the plasma sample t.

3. Test if the New Plasma Samples are from the k^(th) Type of Abnormal Individual.

For the new plasma sample t, let x^(T) ^(k) and y^(T) ^(k) be mean vectors of the methylation levels of the training leukocyte-derived and training ovarian/fallopian or other relevant tissue/cell data for the CpG sites in the target set T_(k) identified in step 1 of this algorithm, apply the fitted regression models obtained from the step 2 of this algorithm to x^(T) ^(k) and y^(T) ^(k) to predict the methylation levels of CpG sites in T_(k) for sample t under the hypothesis that sample t is from a normal. Let n_(k) be the number of CpG sites in T_(k). Define functions f_(k)(x₁, . . . , x_(n) _(k) )=Σ_(i)(−1)^(I_(e) ^(i,k) ^(−e) ^(i) ⁾x_(i) and f_(k,g)(x₁, . . . , x_(n) _(k) )=Σ_(i)(−1)^(I_(e) ^(i,k) ^(−e) ^(i,g) ⁾x_(i), where I_(⋅)=I_((−∞,0))(⋅), that is, the indicator function for the interval (−∞, 0), e_(i), e_(i,k) and e_(i,g) are estimations obtained from step 1.5 of the algorithm. It will be said the sample is from an individual with the k^(th) type of abnormal phenotype if f_(k)(e_(1,t)−e₁, . . . , e_(n) _(k) _(,t)−e_(n) _(k) )>c_(4,k), and f_(k,g)(e_(1,t)−e_(1,g), . . . , e_(n) _(k) _(,t)−e_(n) _(k) _(,g))>c_(5,g) for all g≠k, where e_(i,t) is the difference between the observed methylation level of the CpG site i∈T_(k) for sample t and the predicted value by the fitted model obtained from step 2, and g≠k is any type of abnormal phenotype that is different form the k^(th) type of abnormal phenotype.

Other ways of implementing the algorithm of this Example can be developed by modifying the simple implementation presented above. Specifically, it does not need to assume that there are only two component reference methylomes that make up the plasma methylomes, nor does it need to approximate them by mixtures of the component methylomes. Instead, a set of predictor methylomes can be collected that are themselves mixtures of component reference genomes, as long as the number of the predictor methylomes is the same as the number of the reference component methylomes, and the mixture vectors of the predictor methylomes are linearly independent. For example, they can be methylomes of plasma samples with known different proportion of leukocyte-derived and ovarian/fallopian or other relevant tissue/cell DNAs.

In the algorithm of this Example, the difference between observed methylation levels in certain target regions and the predicted methylation levels as the test statistic to determine if in a plasma sample the methylome has been affected by ovarian cancer. To illustrate the advantage of this approach, it is assumed that the mixture vector p₁ for the methylome of a normal plasma sample j followed a Dirichlet's distribution with parameters α₁= . . . =α_(R). Furthermore, for CpG site i, its methylation levels in the R reference vector p_(i) for component methylomes are m_(i,r)=(r−1)/(R−1). It can be shown that the methylation level of i in sample j then has a mean of 0.5, and a variance of

$\frac{R + 1}{12\left( {R - 1} \right)\left( {{R\;\alpha_{1}} + 1} \right)}.$

If there is a methyl-seq library of sample j with a coverage of N for CpG site i, the variance of the measured methylation level z_(i,j) is

$\sigma_{1}^{2} = {\frac{1}{4N} + {\frac{N - 1}{N}{\frac{R + 1}{12\left( {R - 1} \right)\left( {{R\;\alpha_{1}} + 1} \right)}.}}}$

In other words, if z_(i,j) is used as a test statistic to detect and phenotype ovarian cancer using a plasma sample, under the null hypothesis, the test statistic has a variance of σ₁ ². However, in the algorithm of this Example, it is first estimated the mixture vector p_(j), then predicted z_(i,j) by Σ_(r)m_(i,r)p_(r,j). Note that in methyl-seq data, millions of CpG sites can be contained in each library, and that the variance of the coefficients in a linear regression model is inversely proportional to sample size. Thus it is possible to get highly accurate estimation of the mixture vector p_(j), even if it is taken into account that adjacent CpG sites tend to have correlated methylation levels. Assuming an accurate estimate of Σ_(r)m_(i,r)p_(r,j) can be obtained, that is, the error of the estimation can be ignored, the variance of the difference z_(i,j)−Σ_(r)m_(i,r) p_(r,j) between the observed methylation level and the prediction will De

$\frac{1}{4N} - {\frac{1}{N}{\frac{R + 1}{12\left( {R - 1} \right)\left( {{R\;\alpha_{1}} + 1} \right)}.}}$

In other words, under the null hypothesis, the test static z_(i,j)−Σ_(r) m_(i,r) p_(r,j) used in the presently disclosed algorithm has a much smaller variance than the other candidate test statistic z_(i,j). This in turns means that the presently disclosed test will achieve a higher power at the same level of type I error.

Examples of Embodiments of Example 6

F1. A method for diagnosing, prognosing, classifying, and/or monitoring ovarian cancer in a mammal, comprising:

(a) obtaining a sample from the mammal;

(b) determining the methylation status and/or level of one or more genomic loci in the sample;

(c) comparing the methylation status and/or level of the one or more genomic loci to a reference or a set of predicted values; and

(d) diagnosing ovarian cancer in the mammal,

wherein the difference in the methylation status and/or level of the one or more genomic loci in the sample compared to the reference or the set of predicted values indicates the presence of ovarian cancer in the mammal.

F2. The method of embodiment F1, wherein an increase in the level of methylation of the one or more genomic loci in the sample indicates the presence of ovarian cancer in the mammal or a decrease in the level of methylation of the one or more genomic loci in the sample indicates the presence of ovarian cancer in the mammal. F3. The method of embodiment F1, wherein a decrease in the level of methylation of at least one of the one or more genomic loci in the sample and an increase in the level of methylation of at least one of the one or more genomic loci in the sample indicates the presence of ovarian cancer in the mammal. F4. A method of treating ovarian cancer in a mammal, comprising:

(a) obtaining a sample from the mammal;

(b) determining the methylation status and/or level of one or more genomic loci present in the sample;

(c) comparing the methylation status and/or level of the one or more genomic loci to a reference or a set of predicted values;

(d) diagnosing ovarian cancer in the mammal, wherein the difference in the methylation status and/or level of the one or more genomic loci in the sample compared to the reference or the set of predicted values indicates the presence of ovarian cancer in the mammal; and

(e) administering a chemotherapy, an immunotherapy, or both to said mammal.

F5. The method of any one of embodiments F1-F4, wherein the reference is the methylation status and/or level of the one or more genomic loci in a sample obtained from a mammal that does not have ovarian cancer. F6. The method of any one of embodiments F1-F5, wherein said sample is a plasma sample. F7. A method of treating ovarian cancer comprising:

(a) measuring the methylation status and/or level of one or more genomic loci present in a sample from a mammal prior to a treatment of ovarian cancer;

(b) measuring the methylation status and/or level of one or more genomic loci present in a sample from the mammal during the treatment of ovarian cancer; and

(c) continuing the treatment if the difference in the methylation status and/or level of the one or more genomic loci between the samples from prior to and during the treatment of ovarian cancer indicates the mammal is responsive to the treatment.

F8. The method of embodiment F7, further comprising (d) administering a different treatment to the mammal if the difference in the methylation status and/or level of the one or more genomic loci between the samples from prior to and during the treatment of ovarian cancer indicates the mammal is not responsive to the treatment. F9. The method of embodiment F7, wherein an increase in the level of methylation of the one or more genomic loci in the sample indicates the mammal is responsive to the treatment, or a decrease in the level of methylation of the one or more genomic loci in the sample indicates the mammal is responsive to the treatment. F10. The method of embodiment F7, wherein a decrease in the level of methylation of at least one of the one or more genomic loci in the sample and an increase in the level of methylation of at least one of the one or more genomic loci in the sample indicates the mammal is responsive to the treatment. F11. The method of embodiment F7, wherein an increase in the level of methylation of the one or more genomic loci in the sample indicates the mammal is not responsive to the treatment, or a decrease in the level of methylation of the one or more genomic loci in the sample indicates the mammal is not responsive to the treatment. F12. The method of embodiment F7, wherein a decrease in the level of methylation of at least one of the one or more genomic loci in the sample and an increase in the level of methylation of at least one of the one or more genomic loci in the sample indicates the mammal is not responsive to the treatment. F13. The method of any one of embodiments F7-F12, wherein said sample is a plasma sample. F14. The method of any one of embodiments F1-F13, wherein the one or more genomic loci comprise one or more CpG sites. F15. The method of any one of embodiments F1-F14, wherein the one or more genomic loci are present within nucleic acids isolated from the sample. F16. The method of any one of embodiments F1-F15, wherein the one or more genomic loci are present within cell-free nucleic acids isolated from the sample. f17. A method of treating ovarian cancer, comprising;

(a) diagnosing ovarian cancer in a mammal by utilization of the algorithm disclosed in Example Embodiment L; and

(b) administering a chemotherapy, an immunotherapy, or both to said mammal to treat said ovarian cancer.

F18. The method of any one of embodiments F1-F17, wherein said mammal is a human. F19. A kit for diagnosing, prognosing, and/or monitoring ovarian cancer in a mammal comprising a means for determining and/or detecting the methylation status of one or more genomic loci. F20. The kit of embodiment F19, wherein the means comprises one or more primers and/or probes for determining and/or detecting the methylation status of the one or more genomic loci.

Abstract of this Example (Example 6)

Example 6 provides methods for diagnosing, prognosing, monitoring, classifying, and/or treating ovarian cancer. For example, algorithms, kits, and methods for diagnosing, prognosing, monitoring, classifying, and/or treating ovarian cancer are provided.

Other Embodiments

It is to be understood that while the invention has been described in conjunction with the detailed description thereof, the foregoing description is intended to illustrate and not limit the scope of the invention, which is defined by the scope of the appended claims. Other aspects, advantages, and modifications are within the scope of the following claims. 

1. A computer-implemented method, comprising: obtaining, by a computing system, initial sequence data that describes sequences of an initial set of nucleic acids from a biological sample of a person, the initial set of nucleic acids including nucleic acids originating from a plurality of different tissues of the person; filtering, by the computing system, the initial sequence data to generate filtered sequence data that describes sequences of a filtered subset of nucleic acids from the biological sample, wherein the filtering includes (i) selecting target nucleic acids from the initial set of nucleic acids based on at least one of a methylation characteristic or a copy number characteristic of the target nucleic acids and (ii) enriching the target nucleic acids in the filtered subset; determining, by the computing system, a methylation profile for the filtered subset of nucleic acids from the biological sample; processing, by the computing system, the methylation profile for the filtered subset of nucleic acids to determine a likelihood that the person has a specified medical condition; and outputting, by the computing system, an indication of the likelihood that the person has the specified medical condition.
 2. The computer-implemented method of claim 1, further comprising identifying a pre-defined set of genomic regions; wherein selecting target nucleic acids from the initial set of nucleic acids comprises comparing nucleic acid sequences from the initial set of nucleic acids to sequences from the pre-defined set of genomic regions; and wherein enriching the target nucleic acids in the filtered subset comprises discarding nucleic acid sequences from the initial sequence data that are not among the sequences from the pre-defined set of genomic regions, while retaining nucleic acid sequences from the initial sequence data that are among the sequences from the pre-defined set of genomic regions.
 3. The computer-implemented method of claim 2, wherein at least a first subset of the pre-defined set of genomic regions are defined based on the regions in the first subset exhibiting a minimum level of stability with respect to at least one of the methylation characteristic or the copy number characteristic in a population of individuals.
 4. The computer-implemented method of claim 3, wherein at least a second subset of the pre-defined set of genomic regions are defined based on the regions in the second subset exhibiting at least a minimum difference with respect to the methylation characteristic or the copy number characteristic between individuals who have the specified medical condition and individuals who do not have the specified medical condition.
 5. The computer-implemented method of claim 1, wherein the biological sample comprises plasma, and the initial set of nucleic acids comprises cell-free DNA in the plasma.
 6. The computer-implemented method of claim 1, further comprising: identifying a set of restricted reference component methylomes in the initial set or filtered subset of nucleic acids; identifying a set of reference component methylomes; determining a proportion of the reference component methylomes at a reference set of CpG sites in the initial set or filtered subset of nucleic acids; generating predictions of methylation levels at a target set of CpG sites in the initial set or filtered subset of nucleic acids; comparing the predictions of methylation levels at the target set of CpG sites to observed methylation levels; and determining whether the person likely has or does not have the specified medical condition based on the comparison.
 7. The computer-implemented method of claim 1, wherein the biological sample comprises a stool sample or cerebrospinal fluid.
 8. The computer-implemented method of claim 1, further comprising: determining, by the computing system, a copy number profile for the filtered subset of nucleic acids from the biological sample; and processing, by the computing system, the copy number profile along with the methylation profile for the filtered subset of nucleic acids to determine the likelihood that the person has the specified medical condition.
 9. The computer-implemented method of claim 1, wherein the initial set of nucleic acids were treated to facilitate detection of methylated sites before sequencing.
 10. The computer-implemented method of claim 1, wherein the specified medical condition is ovarian cancer, endometriosis, necrotizing enterocolitis, fetal aneuploidy, preeclampsia, or a brain condition.
 11. The computer-implemented method of claim 1, wherein the methylation profile for the filtered subset of nucleic acids indicates, for each of a plurality of genomic loci, a methylation level of the locus.
 12. The computer-implemented method of claim 1, wherein the genomic loci is a CpG site, CpG island, differentially methylated region (DMR), promoter region, enhancer region, or CpG island shore.
 13. The computer-implemented method of claim 1, wherein determining the likelihood that the person has the specified medical condition comprises determining a probability that the person has the specified medical condition.
 14. The computer-implemented method of claim 1, wherein determining the likelihood that the person has the specified medical condition comprises generating a binary indication that the person either likely has the specified medical condition or likely does not have the specified medical condition.
 15. The computer-implemented method of claim 1, wherein processing the methylation profile comprises providing data representing the methylation profile as input to a machine-learning model, and obtaining the likelihood, or a value from which the likelihood is derived, as an output of the machine-learning model.
 16. The computer-implemented method of claim 15, wherein the machine-learning model comprises at least one of a classifier, an artificial neural network, a support vector machine, a decision tree, or a regression model.
 17. The computer-implemented method of claim 15, wherein the machine-learning model defines reference or predicted methylation profiles against which the methylation profile for the filtered subset are compared to determine the likelihood that the person has the specified medical condition.
 18. The computer-implemented method of claim 1, wherein the determined likelihood that the person has the specified medical condition is used by a medical provider to assess whether to perform additional diagnostic testing on the person.
 19. The computer-implemented method of claim 1, wherein the determined likelihood that the person has the specified medical condition is used by a medical provider to at least one of diagnose the person or treat the person for the specified medical condition.
 20. The computer-implemented method of claim 1, wherein outputting the indication of the likelihood that the person has the specified medical condition comprises at least one of presenting the indication on an electronic display, audibly playing the indication through a speaker, storing the indication in a memory of a computing system for subsequent retrieval, or transmitting the indication in an electronic message to one or more users.
 21. The computer-implemented method of claim 1, wherein enriching the target nucleic acids in the filtered subset comprises generating the filtered subset so that a fraction of the target nucleic acids that occur in the filtered subset is greater than a fraction of the target nucleic acids that occur in the initial set of nucleic acids.
 22. The computer-implemented method of claim 1, wherein the filtered subset consists exclusively of the target nucleic acids.
 23. The computer-implemented method of claim 1, wherein the filtered subset comprises the target nucleic acids and non-targeted nucleic acids. 24-25. (canceled)
 26. A computer-implemented method, comprising: obtaining, by a computing system, initial sequence data that describes sequences of an initial set of nucleic acids from a biological sample of a person, the initial set of nucleic acids including nucleic acids originating from a plurality of different tissues of the person; filtering, by the computing system, the initial sequence data to identify a first subset of sequences from the initial sequence data that correspond to a first pre-defined set of genomic regions; filtering, by the computing system, the initial sequence data to identify a second subset of sequences from the initial sequence data that correspond to a second pre-defined set of genomic regions; processing, by the computing system, data that includes an observed methylation profile of the first subset of sequences to generate a predicted methylation profile of the second subset of sequences; comparing, by the computing system, an observed methylation profile of the second subset of sequences to the predicted methylation profile of the second subset of sequences to determine whether the person has a specified medical condition, wherein the person is deemed to have the specified medical condition if a difference between the observed methylation profile of the second subset of sequences and the predicted methylation profile of the second subset of sequences meets a minimum difference criterion; and outputting, by the computing system, an indication of whether the person was determined to have the specified medical condition.
 27. The computer-implemented method of claim 26, wherein: the first pre-defined set of genomic regions are regions that exhibit a minimum level of stability with respect to at least one of a methylation characteristic or a copy number characteristic in a population of individuals; and the second pre-defined set of genomic regions are regions that exhibit a minimum difference with respect to at least one of the methylation characteristic or the copy number characteristic between a first sub-population of individuals who have the specified medical condition and a second sub-population of individuals who do not have the specified medical condition.
 28. The computer-implemented method of claim 26, wherein the first pre-defined set of genomic regions is a first reference set of genomic regions, and the second pre-defined set of genomic regions is a first target set of genomic regions; the method further comprising: selecting the first reference set of genomic regions as the first pre-defined set of genomic regions from a database that includes a plurality of reference sets of genomic regions, wherein different ones of the plurality of reference sets of genomic regions correspond to different medical conditions; and selecting the first target set of genomic regions as the second pre-defined set of genomic regions from the database, wherein the database further includes a plurality of target sets of genomic regions, wherein different ones of the plurality of target sets of genomic regions correspond to different medical conditions.
 29. The computer-implemented method of claim 26, wherein the specified medical condition is preeclampsia, endometriosis, ovarian cancer, necrotizing enterocolitis, or a brain condition. 30-31. (canceled)
 32. A computer-implemented method, comprising: obtaining, by a computing system, initial sequence data that describes sequences of an initial set of nucleic acids from a biological sample of a person, the initial set of nucleic acids including nucleic acids originating from a plurality of different tissues of the person; filtering, by the computing system, the initial sequence data to identify a target subset of sequences from the initial sequence data that correspond to a pre-defined set of genomic regions; comparing, by the computing system, an observed methylation profile of the target subset of sequences to a pre-defined methylation profile to determine whether the person has a specified medical condition, wherein the person is deemed to have the specified medical condition if a difference between the observed methylation profile of the target subset of sequences and the pre-defined methylation profile meets a minimum difference criterion; and outputting, by the computing system, an indication of whether the person was determined to have the specified medical condition. 33-34. (canceled) 